Abstract

In this paper, a novel method is proposed for Chinese large-scale online encyclopedia knowledge denoising. Firstly, the initial similarity of the triples is acquired by the similarity computing method integrating the Edit-Distance and TongYiCiCiLin similarity algorithm. Secondly, a novel nuclear field-like potential function of the Infobox knowledge triples is constructed in virtue of Chinese encyclopedia entry semantic tag. Finally, large-scale knowledge triple clustering and denoising are performed by means of the improved potential function proposed in this paper for the purpose of minimizing the influence of massive repetition and ambiguity in the Chinese open encyclopedia Knowledge Base (KB). The proposed method has solved the problems of semantic duplication, ambiguity and inappropriate classification of knowledge triples arising from constructing Chinese KBs. The experimental results indicate that the open-domain oriented Chinese encyclopedia KBs constructed by the method proposed in this paper is outperformed than the state-of-the-art methods.

Highlights

  • The vision of the Semantic Web is to create a ‘‘Web of Data’’, so that a machine is able to understand the semantic information on the internet [1]

  • In order to enhance the precision of knowledge base (KB), Wang et al [18] propose a self-expanded learning method to predict on the semantic relations between subjects and objects while extracting the knowledge triples from the plain-texts of entry’s web page of Chinese encyclopedia

  • There are a large amount of noise knowledge still in Chinese online encyclopedia KBs brought by previous work, which is mainly caused by the ambiguity and inappropriate classification of Infobox triples due to the open collaborative characteristics and tag settings etc. of online encyclopedia

Read more

Summary

INTRODUCTION

The vision of the Semantic Web is to create a ‘‘Web of Data’’, so that a machine is able to understand the semantic information on the internet [1]. With the development and application of internet technology, it has gradually become an open platform for information release, communication and sharing. Information inquiry and knowledge acquisition have been gradually transformed from offline to online. People’s life has entered into the era of big data and knowledge graph (KG) due to the rapid development of Web 3.0. The flourishing online encyclopedia provides a high-quality data source for. With the maturing and emerging of Chinese online encyclopedia, the academic circle has started focusing and researching for automatic knowledge extraction and KB construction. A large number of synonymy, ambiguity and improper classification of knowledge in Chinese entries have resulted in problems of low efficiency and precision of Chinese online encyclopedia KB

RELATED WORK
SYSTEM FRAMEWORK
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call