As increasingly more semantic real-world data is stored in knowledge graphs, providing intuitive and effective query methods for end-users is a fundamental and challenging task. Since there is a gap between the plain natural language question (NLQ) and structured data, most RDF question/answering (Q/A) systems construct SPARQL queries from NLQs and obtain precise answers from knowledge graphs. A major challenge is how to disambiguate the mapping of phrases and relations in a question to the dataset items, especially in complex questions. In this paper, we propose a novel data-driven graph similarity framework for RDF Q/A to extract the query graph patterns directly from the knowledge graph instead of constructing them with semantically mapped items. An uncertain question graph is presented to model the interpretations of an NLQ, based on which our problem is reduced to a graph alignment problem. In formulating the alignment, both the lexical and structural similarity of graphs are considered, hence, the target RDF subgraph is used as a query graph pattern to construct the final query. We create a pruned entity graph dynamically based on the complexity of an input question to reduce the search space on the knowledge graph. Moreover, to reduce the calculating cost of the graph similarity, we compute the similarity scores only for same-distance graph elements and equip the process with an edge association-aware surface form extraction method. Empirical studies over real datasets indicate that our proposed approach is flexible and effective as it outperforms state-of-the-art methods significantly.
Read full abstract