Abstract
Multi-label text classification, which tags a given plain text with the most relevant labels from a label space, is an important task in the natural language process. To diagnose diseases, clinical researchers use a machine-learning algorithm to do multi-label clinical text classification. However, conventional machine learning methods can neither capture deep semantic information nor the context of words strictly. Diagnostic information from the EHRs (Electronic Health Records) is mainly constructed by unstructured clinical free text which is an obstacle for clinical feature extraction. Moreover, feature engineering is time-consuming and labor-intensive. With the rapid development of deep learning, we apply neural network models to resolve this problem mentioned above. To favor multi-label classification on EHRs, we propose FAMLC-BERT (Feature-level Attention for Multi-label classification on BERT) to capture semantic features from different layers. The model uses feature-level attention with BERT to recognize the labels of EHRs. We empirically compared our model with other state-of-the-art models on real-world documents collected from the hospital. Experiments show that our model achieved significant improvements compared to other selected benchmarks.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have