Abstract

This paper proposes a big data query system for customized queries based on specific business needs. This paper introduces the components and structure of the query system. ANTLR tools are used as language recognizer to design and implement a customized SQL dialect. The system builds a simpler and easier query interface on Spark SQL, which satisfies the query requirements of the Internet user behavior analysis platform.

Highlights

  • In recent years, with the increase in the number of mobile Internet users, user data has gradually accumulated, and the application of big data technology has become more and more extensive

  • Spark offers a programming abstraction called Resilient Distributed Datasets (RDDs), which enables efficient data reuse compared to existing models (MapReduce)

  • Spark SQL uses a new component called Catalyst and it is the key component of Spark SQL and we will dive deeper into Catalyst in future work[9]

Read more

Summary

Introduction

With the increase in the number of mobile Internet users, user data has gradually accumulated, and the application of big data technology has become more and more extensive. Internet user behavior analysis is one of the important applications. It is reasonable to process user data using big data query platform. A native SQL layer is introduced on top of spark The emergence of big data technologies such as Spark has provided strong technical support for large scale data processing. The query demand is the basic requirement of the Internet user behavior analysis system. This paper focuses on several common query scenarios of the Internet user behavior analysis system. This paper implements a customized query system, which provides a quick and convenient query platform for data analysts to meet the burst of query requirements

Domain specific language on top of Spark SQL
The structure of the system and its components
An example of customized query and SQL parser implementation
Correctness and performance test
Conclusion and future work
Spark Jobserver
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call