Automatic extraction of Uyghur domain concepts based on multi‐feature for ontology extension

Hankiz Yilahun,Kudret Abdurahman,Askar Hamdulla,Seyyare Imam

doi:10.1049/iet-net.2018.5240

Abstract

In the internet age, as a conceptual model of knowledge organisation, ontology has become a research hotspot. Ontology extension achieves the purpose of expanding ontology by adding new concepts and discovering the relationships between concepts in the existing ontology. In order to improve the automation and accuracy of ontology concept extraction in the Uyghur language, here the authors propose a new method to automatically extract concepts from text collection. For the characteristics of the Uyghur domain ontology concept, the text preprocessing is performed first, then the inter-word correlation of multi-feature fusion is calculated, such as Mi, Cd and Ea, and finally, the domain terminology and concept are automatically extracted, according to the term frequency–inverse document frequency algorithm. Experiment results show that, in terms of precision and recall rate, the multi-feature method proposed here represents an improvement over other methods. It also proves the feasibility and effectiveness of the authors’ method to extract the domain concept.

Full Text