Abstract
Open domain relation prediction is an important task in triples extraction. When faced with the task of constructing large-scale knowledge graph systems, with the exception of structured data, it is necessary to automatically extract triples from a large amount of unstructured text to expand entities and relations. Although a large number of English open relation prediction methods have achieved good performance, the high-performance system for open domain Chinese triples extraction remains undeveloped due to the lack of large-scale Chinese annotation corpora and the difficulty of Chinese language processing. In this paper, we propose an integrated open domain Chinese triples hierarchical extraction method (CTHE) to solve this problem, considering the advantages of Bi-LSTM-CRF and Att-Bi-GRU models based on the pre-trained BERT encoding model. This method can recognize the named entities from Chinese sentences to establish entity pairs, and implement hierarchical extraction of specific and open relations based on the user-defined schema library and attention mechanism. The experimental results demonstrate the effectiveness of this method, which achieved stable performance on the test dataset, and better precision and F1-score in comparison with state-of-the-art Chinese open domain triples extraction methods. Furthermore, a large-scale annotated dataset for a Chinese named entity recognition (NER) task is established, which provides support for research on Chinese NER tasks.
Highlights
To automatically expand new knowledge, obtaining new structured knowledge from massive amounts of unstructured data has become a popular research issue
This paper proposes an integrated open domain Chinese triples hierarchical extraction method to combine the advantages of deep learning with unsupervised algorithms and effectively expand the generalization ability of the open relation triples extraction model
We propose an integrated open domain Chinese triples hierarchical extraction method with
Summary
To automatically expand new knowledge, obtaining new structured knowledge from massive amounts of unstructured data has become a popular research issue. Compared with the complex challenges of extracting open relation triples in actual scenarios, some limitations remain with the existing methods. It is necessary to develop effective methods to resolve the problems of open domain triples extraction arising from actual scenario requirements. Traditional entity-relation triples extraction methods usually have a pre-defined closed relation set and, in previous research, tasks have been converted into a relation classification problem with good results. Under the open relation triples extraction scenario, the text contains a large number of open entity relations, which far exceeds the number of pre-defined relation types. In this case, the traditional relation classification models cannot directly and effectively obtain the new type of
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.