Abstract

The existing models for NL2SQL tasks are mainly oriented toward English text and cannot solve the problems of column name reuse in Chinese text data, description in natural language query, and inconsistent representation of data stored in the database. To address this problem, this paper proposes a Chinese cross-domain NL2SQL model based on a heterogeneous graph and relative position attention mechanism. This model introduces relational structure information defined by the expert to construct initial heterogeneous graphs for database schemas and natural language questions. The heterogeneous graph is pruned based on natural language questions, and the multi-head relative position attention mechanism is used to encode the database schema and natural language questions. The target SQL statement is generated using a tree-structured decoder with predefined SQL syntax. Experimental results on the CSpider dataset demonstrate that our model better aligns database schema with natural language questions and understands the semantic information in natural language queries, effectively improving the matching accuracy of Chinese multi-table SQL statement generation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call