Abstract

Recently, neural networks have shown promising results for named entity recognition(NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labeled data(source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labeled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labeled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labeled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs better than others. When there is no labeled data in domain target, compared to directly using the source domain labeled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure.

Highlights

  • In recent years, Web data and knowledge management attracts the interests from industry and research fields

  • We focus on domain NER for the politics text domain in Chinese high schools to support an automatic question and answer system (Q & A) that will take the national college entrance examination (NCEE) in the future

  • An instance transfer enhanced Recurrent Neural Network (RNN) (ERNN) model is proposed to do NER for politics text target domain, which is trained based on the transferred instances from similar source domain (People’s daily corpus)

Read more

Summary

INTRODUCTION

Web data and knowledge management attracts the interests from industry and research fields. We propose an instance transfer based approach for domain NER with enhanced Recurrent Neural Network (RNN). We design an instance transfer strategy to extract similar sentences from the People’s Daily corpus (source domain). Instances mean labeled data and politics text is our target domain. An instance transfer enhanced RNN (ERNN) model is proposed to do NER for politics text target domain, which is trained based on the transferred instances from similar source domain (People’s daily corpus). Compared with the traditional RNN model, experimental results show that our ERNN with the instance transfer strategy can get improvement from 80.52% to 93.28% in terms of F1 measure. We adopt the co-training approach using our proposed ERNN model and CRF by taking advantage of large unannotated target domain data, which get the F1 value of 94.02.

RELATED WORK
OUR INSTANCE TRANSFER STRATEGY
OUR ERNN MODEL
EXPERIMENTS
EVALUATION AND SETUP
DOMAIN NER EXPERIMENTAL RESULTS
Findings
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.