Abstract
The field of Chinese medical natural language processing faces a significant challenge in training accurate entity recognition models due to the limited availability of high-quality labeled data. In response, we propose a joint training model, MCBERT-GCN-CRF, which achieves high performance in identifying medical-related entities in Chinese electronic medical records. Additionally, we introduce CM-NER, a 5-step framework that effectively mitigates the effects of noise in weakly labeled data and establishes a principled connection between supervised and weakly supervised named entity recognition. We demonstrate significant improvements in recall rate and accuracy. Our approach outperforms traditional fully supervised pre-training models and other state-of-the-art methods by suppressing noise in weakly labeled data. Our proposed framework achieves an F1 score of 86.29% on the CCKS-2019 dataset, significantly higher than pre-trained model baselines ranging from 74.17 to 83.06%, and higher than the top-performing named entity recognition supervised learning models in the CCKS-2019 competition. Our results demonstrate the effectiveness of our proposed framework and highlight the potential of leveraging unlabeled data to train accurate models for named entity recognition in Chinese medical natural language processing. This research has significant implications for advancing natural language processing techniques in the medical domain and improving patient care.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.