Semi-Automatic Corpus Expansion and Extraction of Uyghur-Named Entities and Relations Based on a Hybrid Method

Ayiguli Halike,Kahaerjiang Abiderexiti,Tuergen Yibulayin

doi:10.3390/info11010031

Abstract

Relation extraction is an important task with many applications in natural language processing, such as structured knowledge extraction, knowledge graph construction, and automatic question answering system construction. However, relatively little past work has focused on the construction of the corpus and extraction of Uyghur-named entity relations, resulting in a very limited availability of relation extraction research and a deficiency of annotated relation data. This issue is addressed in the present article by proposing a hybrid Uyghur-named entity relation extraction method that combines a conditional random field model for making suggestions regarding annotation based on extracted relations with a set of rules applied by human annotators to rapidly increase the size of the Uyghur corpus. We integrate our relation extraction method into an existing annotation tool, and, with the help of human correction, we implement Uyghur relation extraction and expand the existing corpus. The effectiveness of our proposed approach is demonstrated based on experimental results by using an existing Uyghur corpus, and our method achieves a maximum weighted average between precision and recall of 61.34%. The method we proposed achieves state-of-the-art results on entity and relation extraction tasks in Uyghur.

Highlights

Extracting entities and relations from unstructured texts is crucial for knowledge base construction in natural language processing (NLP) [1,2,3,4], intelligent question answering systems [5,6], and search engines
Projects such as DBPedia [7], YAGO [8], Kylin/KOG [9,10], and BabelNet [11] have focused on building knowledge graphs by using entities and relations that are extracted from unstructured data obtained from Wikipedia, which is one of the largest sources of multilingual language data on the internet
We focus on the issues raised during Uyghur knowledge graph construction and discuss the main challenges that must be addressed in this task

Summary

Introduction

Extracting entities and relations from unstructured texts is crucial for knowledge base construction in natural language processing (NLP) [1,2,3,4], intelligent question answering systems [5,6], and search engines. The automatic construction of knowledge graphs based on unstructured data has attracted significant interest. Projects such as DBPedia [7], YAGO [8], Kylin/KOG [9,10], and BabelNet [11] have focused on building knowledge graphs by using entities and relations that are extracted from unstructured data obtained from Wikipedia, which is one of the largest sources of multilingual language data on the internet. The above-discussed issue is problematic for Uyghur because relatively little past work has focused on the construction of the corpus and has not included research about the extraction of Uyghur-named entity relations, resulting in the very limited availability of relation extraction research and a deficiency of Information 2020, 11, 31; doi:10.3390/info11010031 www.mdpi.com/journal/information

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Jan 6, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Semi-Automatic Corpus Expansion and Extraction of Uyghur-Named Entities and Relations Based on a Hybrid Method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

MTL-JER: Meta-Transfer Learning for Low-Resource Joint Entity and Relation Extraction
Da Peng ... Zhongmin Pei
-
Da Peng, et. al.Da Peng ... Zhongmin Pei
24 Feb 2023
24 Feb 2023

BioREx: Improving biomedical relation extraction by leveraging heterogeneous datasets
Po-Ting Lai ... Zhiyong Lu
Journal of Biomedical Informatics | VOL. 146
Po-Ting Lai, et. al.Po-Ting Lai ... Zhiyong Lu
04 Sep 2023
Journal of Biomedical Informatics | VOL. 146

A Novel Threat Intelligence Information Extraction System Combining Multiple Models
Yongfei Li ... Yingze Liu
Security and Communication Networks | VOL. 2022
Yongfei Li, et. al.Yongfei Li ... Yingze Liu
09 Dec 2022
Security and Communication Networks | VOL. 2022

Negation-based transfer learning for improving biomedical Named Entity Recognition and Relation Extraction
Hermenegildo Fabregat ... Lourdes Araujo
Journal of Biomedical Informatics | VOL. 138
Hermenegildo Fabregat, et. al.Hermenegildo Fabregat ... Lourdes Araujo
04 Jan 2023
Journal of Biomedical Informatics | VOL. 138

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-Automatic Corpus Expansion and Extraction of Uyghur-Named Entities and Relations Based on a Hybrid Method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information