SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules

Youssef Mellah,El Hassane Ettifouri,Mohammed Ghaouth Belkasmi,Toumi Bouchentouf,Abdelkader Rhouati

doi:10.3844/jcssp.2021.480.489

Abstract

Using Natural Language (NL) to interacting with relational databases allows users from any background to easily query and analyze large amounts of data. This requires a system that understands user questions and automatically converts them into structured query language such as SQL. The best performing Text-to-SQL systems use supervised learning (usually formulated as a classification problem) by approaching this task as a sketch-based slot-filling problem, or by first converting questions into an Intermediate Logical Form (ILF) then convert it to the corresponding SQL query. However, non-supervised modeling that directly converts questions to SQL queries has proven more difficult. In this sense, we propose an approach to directly translate NL questions into SQL statements. In this study, we present a Sequence-to-Sequence (Seq2Seq) parsing model for the NL to SQL task, powered by the Transformers Architecture exploring the two Language Models (LM): Text-To-Text Transfer Transformer (T5) and the Multilingual pre-trained Text-To-Text Transformer (mT5). Besides, we adopt the transformation-based learning algorithm to update the aggregation predictions based on association rules. The resulting model achieves a new state-of-the-art on the WikiSQL DataSet, for the weakly supervised SQL generation.

Highlights

Semantic Parsing (SP) is one of the most important tasks in Natural Language Processing (NLP), it requires both understanding the meaning of Natural Language (NL) sentences and mapping them to formal meaning representations (Zelle and Mooney, 1996; Panait and Luke, 2005; Clarke et al, 2010; Liang et al, 2011) often to machine-executable programs, for a range of tasks such as question-answering (Yih et al, 2014), robotic control (Matuszek et al, 2013) and intelligent tutoring systems (Graesser et al, 2005)
In database areas (Androutsopoulos et al, 1995; Popescu et al, 2003; Affolter et al, 2019), the general problem was known as “Natural Language Interface to Databases (NLIDBs)”, in particular, we are interested in translate natural language questions to SQL, due to the popularity of SQL as the domain-specific language used to query and manage data stored in most available relational databases (Ramakrsihnan et al, 1998)
We present our work for the generation of SQL queries from natural language

Summary

Introduction

Semantic Parsing (SP) is one of the most important tasks in NLP, it requires both understanding the meaning of Natural Language (NL) sentences and mapping them to formal meaning representations (Zelle and Mooney, 1996; Panait and Luke, 2005; Clarke et al, 2010; Liang et al, 2011) often to machine-executable programs, for a range of tasks such as question-answering (Yih et al, 2014), robotic control (Matuszek et al, 2013) and intelligent tutoring systems (Graesser et al, 2005). In database areas (Androutsopoulos et al, 1995; Popescu et al, 2003; Affolter et al, 2019), the general problem was known as “Natural Language Interface to Databases (NLIDBs)”, in particular, we are interested in translate natural language questions to SQL, due to the popularity of SQL as the domain-specific language used to query and manage data stored in most available relational databases (Ramakrsihnan et al, 1998). Despite the importance of the task, researchers have recently appeared to approach Deep Learning (DL) methods for the crucial problem of NLIDBs. Translating an NL to SQL is often referenced as “NLto-SQL” or “Text-to-SQL” (Xu et al, 2017; Zhong et al, 2017; Shi et al, 2018; Yu et al, 2018; He et al, 2019; Hwang et al, 2019; Guo et al, 2019). Almost all works operated on achieving good results on well-known Textto-SQL benchmarks such as ATIS, GeoQuery and WikiSQL (Xu et al, 2017; Shi et al, 2018; Dong and Lapata, 2018; Hwang et al, 2019; He et al, 2019)

Methods

Results

Conclusion