Abstract

The open collaborative characteristics of online encyclopedia and the large number of ambiguity phenomena in the encyclopedia entry lead to inappropriate classification of plenty of Infobox knowledge triples of entries, which requires for refining and denoising of large-scale knowledge to improve the precision of Knowledge Base (KB). The enormous amount of triples in the KBs will cause excessive serial computing time expenditure by knowledge denoising and disambiguation processing. Existing knowledge refinement and disambiguation techniques have limitations in terms of scalability and time-efficient. There is still few typical research on the parallel processing of knowledge refinement in distributed environment. Therefore, this paper proposes a novel parallel algorithm for Chinese large-scale knowledge refinement based on MapReduce to further improve the overall system computing speed through parallel optimization for serial algorithm. Based on the original serial refining algorithm which can enhance the precision of encyclopedia-oriented KBs, results show that the novel parallel denoising algorithm proposed in this paper can further provide the system with good scalability and high speedup.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call