Abstract

Clinical data describing a patient's health status can be multi-labelled. For example, a clinical record describing patient suffering from cough and fever should be tagged with both two disease labels. These co-occurred labels often have interrelation which can be exploited to improve disease classifications. In this work, we treat the categorization of free clinical text as a multi-label learning problem. However, we discover that some commonly used multi-label learning methods might suffer from some severe side effects in exploiting complicated disease label relation, such as over-exploitation of label relation and error-propagation in label prediction. Based on these findings, we propose a novel multi-label learning algorithm called Ensemble of Sampled Classifier Chains (ESCC) to improve clinical text data classification. ESCC automatically learns to select relevant disease information that is helpful to improve classification performance when exploiting possible disease relation. In our conducted experiments, ESCC shows strong advantages over other state-of-the-art multi-label algorithms on medical text data with significant improvement in performance. The proposed algorithm is promising in mining knowledge from a wide range of multi-label medical text data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.