SQLSketch: Generating SQL Queries using a sketch-based approach

Karam Ahkouk,Khadija Majhadi,Mustapha Machkour,Rachid Mama

doi:10.3233/jifs-210359

Abstract

In the last decade, many intelligent interfaces and layers have been suggested to allow the use of relational databases and extraction of the content using only the natural language. However most of them struggle when exposed to new databases. In this article, we present SQLSketch, a sketch-based network for generating SQL queries to address the problem of automatically translate Natural Languages questions to SQL using the related databases schemas. We argue that the previous models that use full or partial sequence-to-sequence structure in the decoding phase can, in fact, have counter-effect on the generation operation and came up with more loss of the context or the meaning of the user question. In this regard, we use a full sketch-based structure that decouples the generation process into many small prediction modules. The SQLSketch is evaluated against GreatSQL, a new cross-domain, large-scale and balanced dataset for the Natural Language to SQL translation task. For a long-term aim of making better models and contributing in adding more improvements to the semantic parsing tasks, we propose the GreatSQL dataset as the first balanced cross-domain corpus that includes 45,969 pairs of natural language questions and their corresponding SQL queries in addition to simplified and well structured ground-truth annotations. We establish results for SQLSketch using GreatSQL dataset and compare the performance against two popular types of models that represent the sequential and partial-sketch based approaches. Experimental result shows that SQLSketch outperforms the baseline models by 13% in exact matching accuracy and achieve a score of 23.9% to be the new state-of-the-art model on GreatSQL.

Full Text