Abstract

Electronic medical records (EMRs) contain rich medical information, which is of great significance to medical research. The amount of Chinese EMRs is growing, whereas the current named entity recognition methods based on machine learning do not consider the unique characteristics of Chinese EMR. In this paper, four types of entities for disease, symptom, inspection and treatment are trained and tested using the conditional random field model. Firstly, tag-of-words, part-of-speech and context are selected as the basic features. Secondly, by analyzing the characteristics of Chinese electronic medical record text, the chapter name feature, core word feature and word clustering feature are selected as the extended features. Among them, the core word feature is obtained by dividing the collected dictionary into characters and words and then counting the character frequency and word frequency. The word vector clustering feature is obtained by clustering word vectors. Then, by constructing a medical dictionary, a semi-automatic corpus annotation method is used to randomly extract and classify the corpora of a certain scale. Finally, using the conditional random field tool CRF++ to learn and predict, it achieves an accuracy of 93.03%, a recall rate of 90.69%, and an F value of 91.85%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.