Abstract

We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard.

Highlights

  • Recent years have seen a great deal of renewed interest in Text-to-SQL, i.e., synthesizing a SQL query from a question

  • Advanced neural approaches synthesize SQL queries in an end-to-end manner and achieve more than 80% exact matching accuracy on public Text-to-SQL benchmarks (e.g., ATIS, GeoQuery and WikiSQL) (Krishnamurthy et al, 2017; Zhong et al, 2017; Xu et al, 2017; Yaghmazadeh et al, 2017; Yu et al, 2018a; Dong and Lapata, 2018; Wang et al, 2018; Hwang et al, 2019)

  • The goal of schema linking in IRNet is to recognize the columns and the tables mentioned in a question, and assign different types to the columns based on how they are mentioned in the question

Read more

Summary

Introduction

Recent years have seen a great deal of renewed interest in Text-to-SQL, i.e., synthesizing a SQL query from a question. Such implementation details are rarely considered by end users and rarely mentioned in questions This poses a severe challenge for existing end-to-end neural approaches to synthesize SQL queries in the absence of detailed specification. The large number of OOD words poses another steep challenge in predicting columns in SQL queries (Yu et al, 2018b), because the OOD words usually lack of accurate representations in neural models. We regard this challenge as a lexical problem. IRNet adopts a grammar-based neural model to synthesize a SemQL query, which is an intermediate representation (IR) that we design to bridge NL and SQL. It reveals that designing an effective intermediate representation to bridge NL and SQL is a promising direction to being there for complex and cross-domain Text-to-SQL

Intermediate Representation
Schema Linking
Experiment Setup
Experimental Results
Ablation Study
Error Analysis
Discussion
Related Work
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.