SHC: Distributed Query Processing for Non-Relational Data Store

Weiqing Yang,Bikas Saha,Yongyang Yu,Yanbo Liang,Mingjie Tang

doi:10.1109/icde.2018.00165

Abstract

We introduce a simple data model to process non-relational data for relational operations, and SHC (Apache Spark - Apache HBase Connector), an implementation of this model in the cluster computing framework, Spark. SHC leverages optimization techniques of relational data processing over the distributed and column-oriented key-value store (i.e., HBase). Compared to existing systems, SHC makes two major contributions. At first, SHC offers a much tighter integration between optimizations of relational data processing and non-relational data store, through a plug-in implementation that integrates with Spark SQL, a distributed in-memory computing engine for relational data. The design makes the system maintenance relatively easy, and enables users to perform complex data analytics on top of key-value store. Second, SHC leverages the Spark SQL Catalyst engine for high performance query optimizations and processing, e.g., data partitions pruning, columns pruning, predicates pushdown and data locality. SHC has been deployed and used in multiple production environments with hundreds of nodes, and provides OLAP query processing on petabytes of data efficiently.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SHC: Distributed Query Processing for Non-Relational Data Store

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Spark SQL
Michael Armbrust ... Davies Liu
-
Michael Armbrust, et. al.Michael Armbrust ... Davies Liu
27 May 2015
27 May 2015

Performance Evaluation of Spark SQL for Batch Processing
K Anusha ... K Usha Rani
-
K Anusha, et. al.K Anusha ... K Usha Rani
01 Jan 2020
01 Jan 2020

Goldfish: In-Memory Massive Parallel Processing SQL Engine Based on Columnar Store
Jin Wang ... Guangqiang Ying
-
Jin Wang, et. al.Jin Wang ... Guangqiang Ying
01 Dec 2016
01 Dec 2016

Optimization in the catalyst optimizer of Spark SQL
Meenu Chawla ... Vinita Baniwal
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES | VOL. 26
Meenu Chawla, et. al.Meenu Chawla ... Vinita Baniwal
28 Sep 2018
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SHC: Distributed Query Processing for Non-Relational Data Store

Abstract

Talk to us

Similar Papers