Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Building A Big Data Analytical Pipeline With Hadoop For Processing Enterprise XML Data

Abstract

The current paper shows an end-to-end approach how to process XML files in the Hadoop ecosystem. The work demonstrates a way how to handle problems faced during the analysis of a large amounts of XML files. The paper presents a completed Extract, Load and Transform (ELT) cycle, which is based on the open source software stack Apache Hadoop, which became a standard for processing of a huge amounts of data. This work shows that applying open source solutions to a particular set of problems could not be enough. In fact, most of big data processing open source tools were implemented only to address a limited number of the use cases. This work explains and shows, why exactly specific use cases may require significant extension with a self-developed multiple software components. The use case described in the paper deals with huge amounts of semi-structured XML files, which supposed to be persisted and processed daily

Similar works

This paper was published in AIS Electronic Library (AISeL).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.