Abstract

Text-to-SQL is the task of mapping natural language utterances to structured query language (SQL). Prior studies focus on information retrieval aspect of this task. In this paper, we demonstrate a new use case for the text-to-SQL studies where a user can create database models from natural language and introduce the first dataset for this task. Furthermore, we propose a framework that consists of three modular components: (1) classifier component which predicts the data type and constraints of a column, (2) constraint component which establishes foreign key relationships between tables, (3) query component which generates a series of CREATE queries through a slot-filling approach. We propose various baseline models to evaluate the classifier component in different aspects. Each model is based on a state-of-the-art pre-trained language model that allows us to assess contextualized word representations in the table creation task. The obtained results showed that such representations play a vital role in classifying column data types and constraints correctly. One of the downsides of pre-trained models is the training time and the model size. Our experiments revealed that a multi-task BERT model achieving 75% and 96% accuracy for the data type and constraint prediction tasks, respectively, effectively addresses both problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call