Abstract

In recent years, many scholars have chosen to use word lexicons to incorporate word information into a model based on character input to improve the performance of Chinese relation extraction (RE). For example, Li <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">et al.</i> proposed the MG-Lattice model in 2019 and achieved state-of-the-art (SOTA) results. However, MG-Lattice still has the problem of information loss due to its model structure, which affects the performance of Chinese RE. This paper proposes an adaptive method to include word information at the embedding layer using a word lexicon to merge all words that match each character into a character input-based model to solve the information loss problem of MG-Lattice. The method can be combined with other general neural system networks and has transferability. Experimental studies on two benchmark Chinese RE datasets show that our method achieves an inference speed up to 12.9 times faster than the SOTA model, along with a better performance. The experimental results also show that this method combined with the BERT pretrained model can effectively supplement the information obtained from the pretrained model, further improving the performance of Chinese RE.

Highlights

  • R ELATION extraction (RE) is a subtask of information extraction, aiming to extract semantic relations between entity pairs in natural language sentences

  • Each character of the input sequence is mapped to a dense vector, and a dictionary matching method is used to introduce the word information and merge its weight into the character representation to add its vocabulary enhancement

  • Multiple standard evaluation metrics are applied in the experiments, including the precision, recall, F1-score and area under the curve (AUC)

Read more

Summary

Introduction

R ELATION extraction (RE) is a subtask of information extraction, aiming to extract semantic relations between entity pairs in natural language sentences. Unlike an English RE model, a Chinese RE model based on word input must first perform word segmentation because sentences in Chinese are not naturally segmented. Using a model based on word input will be affected by word segmentation performance. As shown, the Chinese sentence “武 汉 研 究 所有杜鹃(there are cuckoos in Wuhan institute)” has two entities, which are “武 汉(Wuhan)” and “杜 鹃(cuckoos)”. In this case, the correct segmentation is “武汉(Wuhan)/研 究所(institute)/有(have)/杜鹃(cuckoos)”. If the sentence is divided into “武汉(Wuhan)/研究(studies)/所

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.