Abstract
Various knowledge bases (KBs) have been constructed via information extraction from encyclopedias, text and tables, as well as alignment of multiple sources. Their usefulness and usability is often limited by quality issues. One common issue is the presence of erroneous assertions and alignments, often caused by lexical or semantic confusion. We study the problem of correcting such assertions and alignments, and present a general correction framework which combines lexical matching, context-aware sub-KB extraction, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated with one set of literal assertions from DBpedia, one set of entity assertions from an enterprise medical KB, and one set of mapping assertions from a music KB constructed by integrating Wikidata, Discogs and MusicBrainz. It has achieved promising results, with a correction rate (i.e., the ratio of the target assertions/alignments that are corrected with right substitutes) of 70.1 %, 60.9 % and 71.8 %, respectively.
Highlights
Knowledge Bases (KBs) whose variants are often known as Knowledge Graphs [22] are playing an increasingly important role in applications such as search engines, question answering, common sense reasoning and data integration
C KB whose TBox is defined by clinic experts and ABox is extracted from medical articles by some open information extraction tools, and (iii) mapping assertions in a music KB that is constructed
We find that filtering with either assertion prediction (AP) or constraint-based validation (CV) can improve the correction rate
Summary
Knowledge Bases (KBs) whose variants are often known as Knowledge Graphs [22] are playing an increasingly important role in applications such as search engines, question answering, common sense reasoning and data integration They include general purpose KBs such as Wikidata [60], DBpedia [2] and NELL [38], as well as domain specific KBs such as Discogs and MusicBrainz. Chen et al / An assertion and alignment correction framework for large scale knowledge bases knowledge engineering [65], and they often include knowledge from multiple sources that has been integrated via some alignment procedure [6,67] Notwithstanding their important role, these KBs still suffer from various quality issues, including constraint violations and erroneous assertions [15,46], that negatively impact their usefulness and usability. It may use a more expressive language such as
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have