Building A Big Data Analytical Pipeline With Hadoop For Processing Enterprise XML Data

Dmitriyev, Viktor; Kruse, Felix; Precht, Hauke; Becker, Simon; Solsbach, Andreas; Marx Gómez, Jorge

Repository landing page

oai:aisel.aisnet.org:mcis2017-1056

Building A Big Data Analytical Pipeline With Hadoop For Processing Enterprise XML Data

Authors: Viktor Dmitriyev
Felix Kruse
Hauke Precht
Simon Becker
Andreas Solsbach
Jorge Marx Gómez
Publication date: 1 September 2017
Publisher: AIS Electronic Library (AISeL)

Abstract

The current paper shows an end-to-end approach how to process XML files in the Hadoop ecosystem. The work demonstrates a way how to handle problems faced during the analysis of a large amounts of XML files. The paper presents a completed Extract, Load and Transform (ELT) cycle, which is based on the open source software stack Apache Hadoop, which became a standard for processing of a huge amounts of data. This work shows that applying open source solutions to a particular set of problems could not be enough. In fact, most of big data processing open source tools were implemented only to address a limited number of the use cases. This work explains and shows, why exactly specific use cases may require significant extension with a self-developed multiple software components. The use case described in the paper deals with huge amounts of semi-structured XML files, which supposed to be persisted and processed daily

Similar works

Full text

Open in the Core reader

Download PDF

AIS Electronic Library (AISeL)

oai:aisel.aisnet.org:mcis2017-...

Last time updated on 17/04/2020

This paper was published in AIS Electronic Library (AISeL).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.