Design and Implementation of HSQL: A SQL-like language for Data Analysis in Distributed Systems

Anurag Singh Bhadauria,Arjuna Chala,Jeremy Clements,Atreya Bain,Shobha G,Jyoti Shetty

doi:10.14569/ijacsa.2021.0121190

Anurag Singh Bhadauria, Arjuna Chala + Show 4 more

Open Access

https://doi.org/10.14569/ijacsa.2021.0121190

Copy DOI

Abstract

In today’s modern world, we’re experiencing a substantial increase in the use of data in various fields, and this has necessitated the use of distributed systems to consume and process Big Data. Machine learning tends to benefit from the usage of Big Data, and the models generated from such techniques tend to be more effective. However, there is a steep learning curve to getting used to handling Big Data, as traditional data management tools fail to perform well. Distributed systems have become popular, where the task of data processing is split amongst various nodes in clusters. SQL, is a popular database management language popular to data scientists. It is often given second class support, where SQL can be embedded into a primary language of use (e.g. SQL in Scala for Spark), which allows for using SQL but one still needs to know the primary language of the platform (Scala, as per the example, or ECL in HPCC Systems). It may also be present as a supported language. In either case, using useful tooling such as Visualizing data and creating and using machine learning models become difficult, as the user needs to fall back to the primary language of the system. In the proposed work, a new SQL-like language, HSQL, an open source distributed systems solution, was developed for allowing new users to get used to its distributed architecture and the ECL language, with which it primarily operates with (which was chosen as a target). Additionally, a program that could translate HSQL-based programs to ECL for use was made. HSQL was made to be completely inter-compatible with ECL programs, and it was able to provide a compact and easy to comprehend SQL-like syntax for performing general data analysis, creation of Machine learning models and visualizations while allowing a modular structure to such programs.

Highlights

Data has become an essential resource in this age of computing, where a lot of the advancements and innovations we see are based upon models which require huge amounts of data to be built
A design for HSQL is shown which presents a concrete syntax and an implementation for a compiler that translates the specification to ECL, the language used in HPCC Systems
As HSQL is intended to be for data analysts, ECL was chosen as a target due to its highly optimizing compiler and that machine learning performs exceptionally fast in HPCC Systems, even outperforming configured Hadoop for the first iteration of many configured Machine Learning Algorithms

Summary

INTRODUCTION

Data has become an essential resource in this age of computing, where a lot of the advancements and innovations we see are based upon models which require huge amounts of data to be built. There are places where SQL have first-class support, but here access to valuable tools such as visualizations and working with Machine Learning Models as well as commonplace language features get restricted, which have become commonplace and important since the time SQL was developed. Targeted to work with ECL, HSQL compliments ECL very well by providing an abstraction that is easy to use by data scientists who are already familiar with SQL; where data scientists can still take their time to learn the more complex and powerful ECL language for any complex solution they may require. A design for HSQL is shown which presents a concrete syntax and an implementation for a compiler that translates the specification to ECL, the language used in HPCC Systems

HPCC SYSTEMS AND ECL

DESIGN AND IMPLEMENTATION

Defining the HSQL Syntax

Language Recognition

Transpiler

CONCLUSIONS

Limitations and Future

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Design and Implementation of HSQL: A SQL-like language for Data Analysis in Distributed Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2021
License type: cc-by

Similar Papers

Identifying and prioritizing critical factors for promoting the implementation and usage of big data in healthcare
Moon-Koo Kim ... Jong-Hyun Park
Information Development | VOL. 33
Moon-Koo Kim, et. al.Moon-Koo Kim ... Jong-Hyun Park
15 Jun 2016
Information Development | VOL. 33

Big Data: An Introduction
Hrushikesha Mohanty
-
Hrushikesha MohantyHrushikesha Mohanty
01 Jan 2015
01 Jan 2015

Biases in machine learning models and big data analytics: The international criminal and humanitarian law implications
Nema Milaninia
International Review of the Red Cross | VOL. 102
Nema MilaniniaNema Milaninia
01 Apr 2020
International Review of the Red Cross | VOL. 102

Legal Governance of Brain Data Derived from Artificial Intelligence
Mahika Ahluwalia
Voices in Bioethics | VOL. 7
Mahika AhluwaliaMahika Ahluwalia
02 Jun 2021
Voices in Bioethics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Design and Implementation of HSQL: A SQL-like language for Data Analysis in Distributed Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications