On data skewness, stragglers, and MapReduce progress indicators

COPPA, EMILIO; FINOCCHI, Irene

Repository landing page

oai:iris.uniroma1.it:11573/854753

On data skewness, stragglers, and MapReduce progress indicators

Authors: EMILIO COPPA
Irene FINOCCHI
Publication date: 1 January 2015
Publisher: country:USA
Doi

Abstract

We tackle the problem of predicting the performance of MapReduce applications designing accurate progress indicators, which keep programmers informed on the percentage of completed computation time during the execution of a job. This is especially important in pay-as-you-go cloud environments, where slow jobs can be aborted in order to avoid excessive costs. Performance predictions can also serve as a building block for several profile-guided optimizations. By assuming that the running time depends linearly on the input size, state-of-the-art techniques can be seriously harmed by data skewness, load unbalancing, and straggling tasks. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption in a fully online way (i.e., without resorting to profile data collected from previous executions). NearestFit exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Fine-grained profiles required by our theoretical progress model are approximated through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of benchmarks shows that its accuracy is very good, even when competitors incur non-negligible errors and wide prediction fluctuations

Similar works

Full text

Archivio della ricerca- Università di Roma La Sapienza

oai:iris.uniroma1.it:11573/854...

Last time updated on 12/11/2016

This paper was published in Archivio della ricerca- Università di Roma La Sapienza.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.