Abstract

Recently, there is an increasing demand for real-time analytics – that is, up-to-the-minute reporting on business processes have traditionally been handled by warehousing system. But traditional warehousing systems do not have the ability to handle growing data set which are created by modern society. Systems like Apache Hive and Spark SQL share excellent features like scale-out, high availability and flexibility which are widely used now. However, these systems still have drawbacks, for example, Apache Hive has high latency while Spark SQL costs too much memory during processing. In order to fulfill the demand and solve problems, we have designed and implemented Goldfish – an in-memory massive parallel processing SQL engine based on columnar store, which has low latency and memory consumption during query processing. Goldfish makes two main additions. Firstly, it implements an effective compressed distributed in-memory columnar store engine with special indices which makes full use of main memory. Secondly it includes a shared-nothing MPP compute engine which pipelines tasks instead of executing stage by stage. This paper presents the novel design of Goldfish, including the in-memory columnar store engine, fast data import module, query processing, fault tolerance, effective intermediate data structure with high serialization and deserialization performance, and some optimization choices we considered to enhance the query performance. The extensive performance study compares performance of our system with Hive-on-tez and Spark SQL by TPC-H like benchmark, which is reported that Goldfish is 10x faster than Spark SQL in scan query.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.