Abstract

Spark SQL lets spark programmers query structured inside Spark programs using SQL statements. It provides spark programmers with great convenience to leverage the benefits of relational processing, and its internal RDD distributed processing also accelerates query on large sets. However, Spark SQL is not designed for long-run services and its built-in source would load from storage system, such as HDFS and local file system, in each table scan without mechanism. Although users could keep in memory using cache command explicitly, the cached in memory is coarse grained. In this paper, we present an indexing structure which is a pluggable component of Spark SQL based on Apache Spark. Compared with Spark SQL, it has some additional advantages. Firstly, it allows users to create index of structured to be processed, which speeds up the query performance greatly. Secondly, it enables programmers to load fine-grained file of structured into memory, which is flexible to load hot data into memory and to evict cold data out of memory.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.