Abstract
Recently, neural networks have shown promising results for named entity recognition(NER), which needs a number of labeled data to for model training. When meeting a new domain (target domain) for NER, there is no or a few labeled data, which makes domain NER much more difficult. As NER has been researched for a long time, some similar domain already has well labeled data(source domain). Therefore, in this paper, we focus on domain NER by studying how to utilize the labeled data from such similar source domain for the new target domain. We design a kernel function based instance transfer strategy by getting similar labeled sentences from a source domain. Moreover, we propose an enhanced recurrent neural network (ERNN) by adding an additional layer that combines the source domain labeled data into traditional RNN structure. Comprehensive experiments are conducted on two datasets. The comparison results among HMM, CRF and RNN show that RNN performs better than others. When there is no labeled data in domain target, compared to directly using the source domain labeled data without selecting transferred instances, our enhanced RNN approach gets improvement from 0.8052 to 0.9328 in terms of F1 measure.
Highlights
In recent years, Web data and knowledge management attracts the interests from industry and research fields
We focus on domain NER for the politics text domain in Chinese high schools to support an automatic question and answer system (Q & A) that will take the national college entrance examination (NCEE) in the future
An instance transfer enhanced Recurrent Neural Network (RNN) (ERNN) model is proposed to do NER for politics text target domain, which is trained based on the transferred instances from similar source domain (People’s daily corpus)
Summary
Web data and knowledge management attracts the interests from industry and research fields. We propose an instance transfer based approach for domain NER with enhanced Recurrent Neural Network (RNN). We design an instance transfer strategy to extract similar sentences from the People’s Daily corpus (source domain). Instances mean labeled data and politics text is our target domain. An instance transfer enhanced RNN (ERNN) model is proposed to do NER for politics text target domain, which is trained based on the transferred instances from similar source domain (People’s daily corpus). Compared with the traditional RNN model, experimental results show that our ERNN with the instance transfer strategy can get improvement from 80.52% to 93.28% in terms of F1 measure. We adopt the co-training approach using our proposed ERNN model and CRF by taking advantage of large unannotated target domain data, which get the F1 value of 94.02.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.