Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Jiaqi Guo,Dongmei Zhang,Zecheng Zhan,Ting Liu,Yan Gao,Yan Xiao,Jian-Guang Lou

doi:10.18653/v1/p19-1444

Abstract

We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard.

Highlights

Recent years have seen a great deal of renewed interest in Text-to-SQL, i.e., synthesizing a SQL query from a question
Advanced neural approaches synthesize SQL queries in an end-to-end manner and achieve more than 80% exact matching accuracy on public Text-to-SQL benchmarks (e.g., ATIS, GeoQuery and WikiSQL) (Krishnamurthy et al, 2017; Zhong et al, 2017; Xu et al, 2017; Yaghmazadeh et al, 2017; Yu et al, 2018a; Dong and Lapata, 2018; Wang et al, 2018; Hwang et al, 2019)
The goal of schema linking in IRNet is to recognize the columns and the tables mentioned in a question, and assign different types to the columns based on how they are mentioned in the question

Summary

Introduction

Recent years have seen a great deal of renewed interest in Text-to-SQL, i.e., synthesizing a SQL query from a question. Such implementation details are rarely considered by end users and rarely mentioned in questions This poses a severe challenge for existing end-to-end neural approaches to synthesize SQL queries in the absence of detailed specification. The large number of OOD words poses another steep challenge in predicting columns in SQL queries (Yu et al, 2018b), because the OOD words usually lack of accurate representations in neural models. We regard this challenge as a lexical problem. IRNet adopts a grammar-based neural model to synthesize a SemQL query, which is an intermediate representation (IR) that we design to bridge NL and SQL. It reveals that designing an effective intermediate representation to bridge NL and SQL is a promising direction to being there for complex and cross-domain Text-to-SQL

Intermediate Representation

Schema Linking

Experiment Setup

Experimental Results

Ablation Study

Error Analysis

Discussion

Related Work

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 289	License type: cc-by

Similar Papers

An Interactive NL2SQL Approach with Reuse Strategy
Xiaxia Wang ... Ke Chen
-
Xiaxia Wang, et. al.Xiaxia Wang ... Ke Chen
01 Jan 2020
01 Jan 2020

Schema-Based Natural Language Semantic Mapping
Niculae Stratica ... Bipin C Desai
-
Niculae Stratica, et. al.Niculae Stratica ... Bipin C Desai
01 Jan 2004
01 Jan 2004

Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks
Yuyu Luo ... Chengliang Chai
-
Yuyu Luo, et. al.Yuyu Luo ... Chengliang Chai
09 Jun 2021
09 Jun 2021

Generalizing to New Domains by Mapping Natural Language to Lifted LTL
Eric Hsiung ... Xinyu Liu
-
Eric Hsiung, et. al.Eric Hsiung ... Xinyu Liu
23 May 2022
23 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Abstract

Highlights

Summary

Talk to us

Similar Papers