Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

On-demand big data integration. A hybrid ETL approach for reproducible scientific research

Abstract

Scientific research requires access, analysis, and sharing of data that is distributed across various heterogeneous data sources at the scale of the Internet. An eager extract, transform, and load (ETL) process constructs an integrated data repository as its first step, integrating and loading data in its entirety from the data sources. The bootstrap- ping of this process is not efficient for scientific research that requires access to data from very large and typically numerous distributed data sources. A lazy ETL process loads only the metadata, but still eagerly. Lazy ETL is faster in bootstrapping. How- ever, queries on the integrated data repository of eager ETL perform faster, due to the availability of the entire data beforehand. In this paper, we propose a novel ETL approach for scientific data integration, as a hybrid of eager and lazy ETL approaches, and applied both to data as well as metadata. This way, hybrid ETL supports incremen- tal integration and loading of metadata and data from the data sources. We incorporate a human-in-the-loop approach, to enhance the hybrid ETL, with selective data inte- gration driven by the user queries and sharing of integrated data between users. We implement our hybrid ETL approach in a prototype platform, Óbidos, and evaluate it in the context of data sharing for medical research. Óbidos outperforms both the eager ETL and lazy ETL approaches, for scientific research data integration and sharing, through its selective loading of data and metadata, while storing the integrated data in a scalable integrated data repository

Similar works

Full text

thumbnail-image

DIAL UCLouvain

redirect
Last time updated on 01/12/2022

This paper was published in DIAL UCLouvain.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.