Abstract
Relation extraction is an important task with many applications in natural language processing, such as structured knowledge extraction, knowledge graph construction, and automatic question answering system construction. However, relatively little past work has focused on the construction of the corpus and extraction of Uyghur-named entity relations, resulting in a very limited availability of relation extraction research and a deficiency of annotated relation data. This issue is addressed in the present article by proposing a hybrid Uyghur-named entity relation extraction method that combines a conditional random field model for making suggestions regarding annotation based on extracted relations with a set of rules applied by human annotators to rapidly increase the size of the Uyghur corpus. We integrate our relation extraction method into an existing annotation tool, and, with the help of human correction, we implement Uyghur relation extraction and expand the existing corpus. The effectiveness of our proposed approach is demonstrated based on experimental results by using an existing Uyghur corpus, and our method achieves a maximum weighted average between precision and recall of 61.34%. The method we proposed achieves state-of-the-art results on entity and relation extraction tasks in Uyghur.
Highlights
Extracting entities and relations from unstructured texts is crucial for knowledge base construction in natural language processing (NLP) [1,2,3,4], intelligent question answering systems [5,6], and search engines
Projects such as DBPedia [7], YAGO [8], Kylin/KOG [9,10], and BabelNet [11] have focused on building knowledge graphs by using entities and relations that are extracted from unstructured data obtained from Wikipedia, which is one of the largest sources of multilingual language data on the internet
We focus on the issues raised during Uyghur knowledge graph construction and discuss the main challenges that must be addressed in this task
Summary
Extracting entities and relations from unstructured texts is crucial for knowledge base construction in natural language processing (NLP) [1,2,3,4], intelligent question answering systems [5,6], and search engines. The automatic construction of knowledge graphs based on unstructured data has attracted significant interest. Projects such as DBPedia [7], YAGO [8], Kylin/KOG [9,10], and BabelNet [11] have focused on building knowledge graphs by using entities and relations that are extracted from unstructured data obtained from Wikipedia, which is one of the largest sources of multilingual language data on the internet. The above-discussed issue is problematic for Uyghur because relatively little past work has focused on the construction of the corpus and has not included research about the extraction of Uyghur-named entity relations, resulting in the very limited availability of relation extraction research and a deficiency of Information 2020, 11, 31; doi:10.3390/info11010031 www.mdpi.com/journal/information
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.