Abstract

Knowledge bases (KB), such as Probase and ConceptNet, play an important role in many natural language processing tasks. Compared with resource-poor languages such as Chinese, the scale and quality of English knowledge bases are obviously superior. To expand Chinese KBs by using English KB resources, translating English KBs into Chinese is an effective way. In this direction, two major challenges are how to model more structure semantics to improve translation quality and how to avoid labor-intensive feature engineering. We address these challenges by presenting a neural network approach, which learns tree representation by different structure features. We also build a new dataset for English-Chinese KB translation from Probase and ConceptNet, and compare our proposed approach with several baselines on it. Experimental results show that the proposed method improves the translation accuracy compared with baseline methods. Meanwhile, we translate Probase and ConceptNet into Zh-Probase and Zh-ConceptNet by our proposed model, and release them to the public, in hope of speeding up the research in Chinese natural language processing tasks.

Highlights

  • Knowledge bases like ConceptNet [1] and Probase [2] have always been playing the central role in artificial intelligence

  • Most knowledge bases, such as Probase and ConceptNet, are composed of triples, each of which is a fact consisting of two arguments and one relation, and these triples can be clustered into trees, where all triples inside share one argument

  • We construct trees based on triples, design a neural network to capture different features from the tree structure, and try to score each candidate tree to get the best translation

Read more

Summary

INTRODUCTION

Knowledge bases like ConceptNet [1] and Probase [2] have always been playing the central role in artificial intelligence. Some Chinese taxonomic knowledge bases, such as CN-Probase [9] and zhishi.me [10] , have been built, they still suffer two serious problems: first, because their data sources come from online encyclopedias, their concepts are not as numerous and broad as those in Probase, shown, which is the basis for some explicit topic model applications [8]. To handle the disambiguation problem, an adaptive neural network is adopted to translate English knowledge bases into Chinese, which maps both English triples and Chinese triples in the same semantic space and chooses the nearest Chinese triple as the translation result for each English triple [13]. We show the effectiveness of the combined features in our model Based on this neural network, we translate Probase and ConceptNet to Zh-Probase and Zh-ConceptNet, respectively, and show their coverage and accuracy are both satisfactory. In the three sections, we list some highlights and present and formulate the focal problem, present the design of our proposed framework and experimental results, discuss related work, and conclude with a discussion and future directions

PROBLEM FORMULATION
MODEL TRAINING AND PREDICTION
APPLICATION
2) Result
CONCEPTNET TRANSLATION
CONCLUSION AND FUTURE
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call