Adaptive Big Data Pipeline

Orozco-GómezSerrano, Aldo

Repository landing page

oai:rei.iteso.mx:11117/6322

Adaptive Big Data Pipeline

Authors: Aldo Orozco-GómezSerrano
Publication date: 1 September 2020
Publisher: 'ITESO, A.C.'

Abstract

Over the past three decades, data has exponentially evolved from being a simple software by-product to one of the most important companies’ assets used to understand their customers and foresee trends. Deep learning has demonstrated that big volumes of clean data generally provide more flexibility and accuracy when modeling a phenomenon. However, handling ever-increasing data volumes entail new challenges: the lack of expertise to select the appropriate big data tools for the processing pipelines, as well as the speed at which engineers can take such pipelines into production reliably, leveraging the cloud. We introduce a system called Adaptive Big Data Pipelines: a platform to automate data pipelines creation. It provides an interface to capture the data sources, transformations, destinations and execution schedule. The system builds up the cloud infrastructure, schedules and fine-tunes the transformations, and creates the data lineage graph. This system has been tested on data sets of 50 gigabytes, processing them in just a few minutes without user intervention.ITESO, A. C

Similar works

Full text

Open in the Core reader

Download PDF

Repositorio Institucional del ITESO

oai:rei.iteso.mx:11117/6322

Last time updated on 19/03/2022

This paper was published in Repositorio Institucional del ITESO.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.