NE–LP: Normalized entropy- and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs

Tingting Cai,Zhiyuan Ma,Yangming Zhou,Hong Zheng

doi:10.1007/s00521-021-05896-w

Abstract

Electronic health records (EHRs) in hospital information systems contain patients’ diagnoses and treatments, so EHRs are essential to clinical data mining. Of all the tasks in the mining process, Chinese word segmentation (CWS) is a fundamental and important one, and most state-of-the-art methods greatly rely on large scale of manually annotated data. Since annotation is time-consuming and expensive, efforts have been devoted to techniques, such as active learning, to locate the most informative samples for modeling. In this paper, we follow the trend and present an active learning method for CWS in EHRs. Specifically, a new sampling strategy combining normalized entropy with loss prediction (NE–LP) is proposed to select the most valuable data. Meanwhile, to minimize the computational cost of learning, we propose a joint model including a word segmenter and a loss prediction model. Furthermore, to capture interactions between adjacent characters, bigram features are also applied in the joint model. To illustrate the effectiveness of NE–LP, we conducted experiments on EHRs collected from the Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine. The results demonstrate that NE–LP consistently outperforms conventional uncertainty-based sampling strategies for active learning in CWS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NE–LP: Normalized entropy- and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications

Lead the way for us

Journal: Neural Computing and Applications	Publication Date: Mar 29, 2021
Citations: 8

Similar Papers

Association between Alopecia Areata and Atopic Dermatitis: Current Evidence.
Wenwen Chen ... Keke Huang
Journal of the European Academy of Dermatology and Venereology | VOL. 37
Wenwen Chen, et. al.Wenwen Chen ... Keke Huang
21 Mar 2023
Journal of the European Academy of Dermatology and Venereology | VOL. 37

Abstracts from the Society for Acupuncture Research 2013 International ConferenceImpact of Acupuncture Research on 21st Century HealthcareApril 18–21, 2013The Michigan LeagueAnn Arbor, MI
-
The Journal of Alternative and Complementary Medicine | VOL. 19
--
01 Jul 2013
The Journal of Alternative and Complementary Medicine | VOL. 19

Chinese Word Segmentation and Recognition Based on Separable Convolution Bidirectional Long Short-Term Memory and Feature Point
...
-
, et. al. ...
18 Dec 2020
18 Dec 2020

Bibliometric analysis on research hotspots and evolutionary trends of artificial intelligence application in traditional Chinese medicine diagnosis
Zhang Jieyi ... Yan Junfeng
Digital Chinese Medicine | VOL. 6
Zhang Jieyi, et. al.Zhang Jieyi ... Yan Junfeng
01 Jun 2023
Digital Chinese Medicine | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NE–LP: Normalized entropy- and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications