Abstract

Database application is at the core of most web application systems such as web-based email, source codes repository management, public scientific data repository management, news portals, and publication repository of various fields. However, the usage of these database systems for data and information retrieval is severely limited because of lacking support for processing search queries expressed in a natural language (NL). Most web interfaces for databases today only take search queries entered in some form of logical combination of keywords or text strings, which restrict the scope and depth of what a web user really wants to search for, even though natural language based data or information retrieval has made significant advances in recent years. To overcome or at least to alleviate such limitation in web information services, we propose in this article an improved neural model based on an existing framework IRNet for NL query of databases, in which a representation of Gated Graph Neural Network (GGNN) is introduced to encode the database entities and relations. We also represent and use the database values in the prediction model to identify and match table and column names for automatic synthesize a correct SQL statement from a query expressed in a NL sentence. Experiments with a public dataset demonstrates the promising potential of our approach.

Highlights

  • Nowadays database (DB) application is the backbone of most web-based information services such as web-based email, source codes repository management, public scientific data repository management, news portals, and publication repositories of various fields [1]–[3]

  • We introduced a representation of Gated Graph Neural Network (GGNN) [21], [22] to encode the DB schema replacing the original IRNet representation of DB schema

  • To show the value of maximizing the use of information embedded in relational databases in order to improve the prediction performance of a text to SQL (TTS) system, we have described following two new algorithmic components as extensions to the IRNet neural model: 1) Introducing database values into the model, computing the similarity between natural language or textual questions or queries and the database values, and establishing correlations between database values and column names through an Attention mechanism

Read more

Summary

INTRODUCTION

Nowadays database (DB) application is the backbone of most web-based information services such as web-based email, source codes repository management, public scientific data repository management, news portals, and publication repositories of various fields [1]–[3]. To most users without such knowledge and expertise, most likely they will not be able to take full advantage of the search tool for their data or information needs Such limitation can only be overcome or at least alleviated by a natural language interface with the support of NL query to SQL query (NL-SQL) or text to SQL (TTS) capabilities. Yu et al [17] proposed a large-scale, complex, and cross-domain Text-to-SQL dataset Spider containing databases of multiple tables. SyntaxSQLNet [18] is the first model developed for the Spider task using a syntax tree representing the features of the SQL queries It proposed a method for generating cross-domain training data to enhance model performance with data augmentation. Guo et al [23] propose a very interesting deep neural network based approach IRNet to tackle complex and cross-domain Text-to-SQL problems using Spider dataset.

METHODS
ENCODING DB SCHEMA WITH GRAPH NEURAL NETWORK
Merge the table and column vectors into a single node vector
DATASET
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.