A speculative parallel decompression algorithm on Apache Spark

Zhoukai Wang,Cuocuo Lv,Yuxiang Li,Zhong Chen,Yang Liu,Yinliang Zhao

doi:10.1007/s11227-017-2000-3

Abstract

Data decompression is one of the most important techniques in data processing and has been widely used in multimedia information transmission and processing. However, the existing decompression algorithms on multicore platforms are time-consuming and do not support large data well. In order to expand parallelism and enhance decompression efficiency on large-scale datasets, based on the software thread-level speculation technique, this paper raises a speculative parallel decompression algorithm on Apache Spark. By analyzing the data structure of the compressed data, the algorithm firstly hires a function to divide compressed data into blocks which can be decompressed independently and then spawns a number of threads to speculatively decompress data blocks in parallel. At last, the speculative results are merged to form the final outcome. Comparing with the conventional parallel approach on multicore platform, the proposed algorithm is very efficiency and obtains a high parallelism degree by making the best of the resources of the cluster. Experiments show that the proposed approach could achieve 2.6\(\times \) speedup when comparing with the traditional approach in average. In addition, with the growing number of working nodes, the execution time cost decreases gradually, and the speedup scales linearly. The results indicate that the decompression efficiency can be significantly enhanced by adopting this speculative parallel algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A speculative parallel decompression algorithm on Apache Spark

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Journal: The Journal of Supercomputing	Publication Date: Mar 21, 2017
Citations: 9

Similar Papers

ParaCA: A Speculative Parallel Crawling Approach on Apache Spark
Yuxiang Li ... Junchang Jing
-
Yuxiang Li, et. al.Yuxiang Li ... Junchang Jing
01 Jan 2020
01 Jan 2020

농촌유역 물순환 해석을 위한 웹기반 자료 전처리 및 모형 연계 기법 개발
Jihoon Park ... Jeong Hoon Ryu
-
Jihoon Park, et. al.Jihoon Park ... Jeong Hoon Ryu
30 Sep 2015
30 Sep 2015

Data Processing and Analysis: Tools and Techniques for Big Data
Haewon, Byeon
-
Haewon, ByeonHaewon, Byeon
15 Jun 2023
15 Jun 2023

Advances in intelligent mass spectrometry data processing technology for in vivo analysis of natural medicines
Simian Chen ... Caisheng Wu
Chinese Journal of Natural Medicines | VOL. 22
Simian Chen, et. al.Simian Chen ... Caisheng Wu
01 Oct 2024
Chinese Journal of Natural Medicines | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A speculative parallel decompression algorithm on Apache Spark

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing