Automatic Text-to-SQL Machine Translation for Scholarly Publication Database Search

Sulochana Deshmukh,Marwan Bikdash

doi:10.1109/southeastcon44009.2020.9368296

Abstract

A Database Management System (DBMS) is typically accessed using a Structured Query Language (SQL). Here, we investigate a structured method to parse questions in natural language. Our work is a text-to-SQL translator for scholarly publication database search. The work includes implementation of a phrase-based information retrieval and a Text-to-SQL translator. The information retrieval is emphasizes on the recognition of Named Entity (NE), Named Entity Attribute (NEA), Question Intention (QI) and Focus Recognition. NE is a database entry and NEA is the attribute or column name of the corresponding NE entry. QI is the type of a question. The proposed translator is a domain-specific Text-to-SQL translator. Therefore the scope for named entities is limited to the database values. Thus, we use the database itself as a knowledge-base. We derive rules comprised of phrases, predicates and substitution strings. The phrases are regular expressions and the predicates are stop-words. The usual practice of natural language processing removes stop words. Instead, we make use of many stop words as predicates, thus helping the recognition of NE, NEA and QI. The text cosine similarity measure is used to validate NE, NEA and QI. The rule-based approach is computationally effective and does not require any exhaustive training on the dataset, while providing precise translation.

Full Text