Abstract

This paper proposes a simple knowledge base enrichment based on parse tree patterns with a semantic filter. Parse tree patterns are superior to lexical patterns used commonly in many previous studies in that they can manage long distance dependencies among words. In addition, the proposed semantic filter, which is a combination of WordNet-based similarity and word embedding similarity, removes parse tree patterns that are semantically irrelevant to the meaning of a target relation. According to our experiments using the DBpedia ontology and Wikipedia corpus, the average accuracy of the top 100 parse tree patterns for ten relations is 68%, which is 16% higher than that of lexical patterns, and the average accuracy of the newly extracted triples is 60.1%. These results prove that the proposed method produces more relevant patterns for the relations of seed knowledge, and thus more accurate triples are generated by the patterns.

Highlights

  • The World Wide Web contains abundant knowledge by virtue of the contributions of its great number of users and the knowledge is being utilized in diverse fields

  • The DBpedia ontology is used as a knowledge base and the Wikipedia corpus is used as a corpus for generating patterns and extracting new knowledge triples

  • All the triples of the DBpedia ontology that correspond to the ten predicates are employed as seed triples

Read more

Summary

Introduction

The World Wide Web contains abundant knowledge by virtue of the contributions of its great number of users and the knowledge is being utilized in diverse fields. That is, when a seed knowledge is expressed as a triple of two entities and their relation, an intervening lexical sequence between the two entities in a sentence becomes a pattern candidate. Such lexical patterns have been reported to show reasonable performances in many knowledge enriching systems [4,6]. {arg2}” from the sentence “Eve is a daughter of Selene and Michael.” becomes a pattern for a relation workFor, while the pattern does not deliver the meaning of workFor at all It is important for generating high-quality patterns to filter out the pattern candidates that do not deliver the meaning of the relation in a seed triple. The final the semantic confidence between a pattern and a relation in a seed knowledge is defined as an average similarity between the words of the pattern and those of the relation

Knowledge Base Enrichment
Semantic Similarity of Words
Overall Structure of Knowledge Enrichment
Generation of Pattern Candidates
Semantic Similarity as a Semantic Filter
WordNet-Based Similarity
Similarity at Word Embedding Space
New Knowledge Extraction
Experiments
Evaluation of Parse Tree Patterns
Performance of Semantic Filter
Evaluation of Newly Extracted Knowledge
Comparison with Previous Work
Proposed Method
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call