Abstract
Named entity recognition in Chinese electronic medical record is the core technology of key information mining in electronic medical record. The purpose of entity recognition is to identify the different categories of valuable entities in electronic medical records. However, there are some serious problems in Chinese electronic medical records, such as complicated disease names, clinical symptoms and drug names, different doctors’ use of professional terms, irregular writing, excessive professional terms, and unclear boundaries between different entities. At the same time, because it involves the privacy of patients, and unauthorized disclosure of such medical text information is not in line with national regulations. Therefore, at present, there is almost no public Chinese electronic medical record corpus, and all these problems increase the difficulty of entity identification in Chinese electronic medical record. In order to better identify entities in medical texts, this paper formulates a set of labeling rules for medical texts, and gives full play to the advantages of machine learning on top of the data set of this set of autonomous labeling rules. Conditional Random Field model CRF (Conditional Random Field) is used to obtain good experimental results. In the entity recognition experiment of a large number of medical records, the accuracy rate, recall rate and F value are improved to 87.92%, 82.33 and 85.03%, which are better than the evaluation indexes of existing studies and can meet the needs of practical application.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have