Abstract
This paper proposes a simple knowledge base enrichment based on parse tree patterns with a semantic filter. Parse tree patterns are superior to lexical patterns used commonly in many previous studies in that they can manage long distance dependencies among words. In addition, the proposed semantic filter, which is a combination of WordNet-based similarity and word embedding similarity, removes parse tree patterns that are semantically irrelevant to the meaning of a target relation. According to our experiments using the DBpedia ontology and Wikipedia corpus, the average accuracy of the top 100 parse tree patterns for ten relations is 68%, which is 16% higher than that of lexical patterns, and the average accuracy of the newly extracted triples is 60.1%. These results prove that the proposed method produces more relevant patterns for the relations of seed knowledge, and thus more accurate triples are generated by the patterns.
Highlights
The World Wide Web contains abundant knowledge by virtue of the contributions of its great number of users and the knowledge is being utilized in diverse fields
The DBpedia ontology is used as a knowledge base and the Wikipedia corpus is used as a corpus for generating patterns and extracting new knowledge triples
All the triples of the DBpedia ontology that correspond to the ten predicates are employed as seed triples
Summary
The World Wide Web contains abundant knowledge by virtue of the contributions of its great number of users and the knowledge is being utilized in diverse fields. That is, when a seed knowledge is expressed as a triple of two entities and their relation, an intervening lexical sequence between the two entities in a sentence becomes a pattern candidate. Such lexical patterns have been reported to show reasonable performances in many knowledge enriching systems [4,6]. {arg2}” from the sentence “Eve is a daughter of Selene and Michael.” becomes a pattern for a relation workFor, while the pattern does not deliver the meaning of workFor at all It is important for generating high-quality patterns to filter out the pattern candidates that do not deliver the meaning of the relation in a seed triple. The final the semantic confidence between a pattern and a relation in a seed knowledge is defined as an average similarity between the words of the pattern and those of the relation
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.