L. Cheng
Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems
Cheng, L.; Kotoulas, S.; Ward, T.; Theodoropoulos, G.
Authors
S. Kotoulas
T. Ward
G. Theodoropoulos
Abstract
The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of-art PRPD method, we achieve 16% - 167% performance improvement and 24% - 54% less network communication under different join workloads.
Citation
Cheng, L., Kotoulas, S., Ward, T., & Theodoropoulos, G. (2014). Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems. In CIKM'14 : proceedings of the 23rd ACM International Conference on Information and Knowledge Management : November 3-7, 2014, Shanghai, China (1399-1408). https://doi.org/10.1145/2661829.2661888
Conference Name | 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14 |
---|---|
Conference Location | Shanghai, China |
Start Date | Nov 3, 2014 |
End Date | Nov 7, 2014 |
Publication Date | Nov 3, 2014 |
Deposit Date | Apr 21, 2016 |
Publicly Available Date | Apr 28, 2016 |
Pages | 1399-1408 |
Book Title | CIKM'14 : proceedings of the 23rd ACM International Conference on Information and Knowledge Management : November 3-7, 2014, Shanghai, China. |
DOI | https://doi.org/10.1145/2661829.2661888 |
Files
Accepted Conference Proceeding
(409 Kb)
PDF
Copyright Statement
© 2014 ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Long Cheng, Spyros Kotoulas, Tomas E. Ward, and Georgios Theodoropoulos. 2014. Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM '14). ACM, New York, NY, USA, 1399-1408. DOI=http://dx.doi.org/10.1145/2661829.2661888
You might also like
Efficient Comparison of Massive Graphs Through The Use Of 'Graph Fingerprints'
(2016)
Conference Proceeding
Towards large-scale what-if traffic simulation with exact-differential simulation
(2015)
Conference Proceeding
Data Quality Assessment and Anomaly Detection Via Map / Reduce and Linked Data: A Case Study in the Medical Domain
(2015)
Conference Proceeding
Fast Compression of Large Semantic Web Data using X10
(2015)
Journal Article
Towards an Info-Symbiotic Decision Support System for Disaster Risk Management
(2015)
Conference Proceeding
Downloadable Citations
About Durham Research Online (DRO)
Administrator e-mail: dro.admin@durham.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search