Abstract
This paper proposes a big data query system for customized queries based on specific business needs. This paper introduces the components and structure of the query system. ANTLR tools are used as language recognizer to design and implement a customized SQL dialect. The system builds a simpler and easier query interface on Spark SQL, which satisfies the query requirements of the Internet user behavior analysis platform.
Highlights
In recent years, with the increase in the number of mobile Internet users, user data has gradually accumulated, and the application of big data technology has become more and more extensive
Spark offers a programming abstraction called Resilient Distributed Datasets (RDDs), which enables efficient data reuse compared to existing models (MapReduce)
Spark SQL uses a new component called Catalyst and it is the key component of Spark SQL and we will dive deeper into Catalyst in future work[9]
Summary
With the increase in the number of mobile Internet users, user data has gradually accumulated, and the application of big data technology has become more and more extensive. Internet user behavior analysis is one of the important applications. It is reasonable to process user data using big data query platform. A native SQL layer is introduced on top of spark The emergence of big data technologies such as Spark has provided strong technical support for large scale data processing. The query demand is the basic requirement of the Internet user behavior analysis system. This paper focuses on several common query scenarios of the Internet user behavior analysis system. This paper implements a customized query system, which provides a quick and convenient query platform for data analysts to meet the burst of query requirements
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have