A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories

Mehedi Hasan,Alexander Kotov,April Idalski Carcone,Ming Dong,Sylvie Naar,Kathryn Brogan Hartlieb

doi:10.1016/j.jbi.2016.05.004

Mehedi Hasan, Alexander Kotov + Show 4 more

Open Access

https://doi.org/10.1016/j.jbi.2016.05.004

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Informatics	Publication Date: May 13, 2016
Citations: 48	License type: publisher-specific-oa

R Discovery Prime

A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

Question Classification using Semantic, Syntactic and Lexical features
Megha Mishra ... Vishnu Kumar Mishra
International journal of Web & Semantic Technology | VOL. 4
Megha Mishra, et. al.Megha Mishra ... Vishnu Kumar Mishra
31 Jul 2013
International journal of Web & Semantic Technology | VOL. 4

Automatic Detecting Documents Containing Personal Health Information
Yunli Wang ... Yonghua You
-
Yunli Wang, et. al.Yunli Wang ... Yonghua You
01 Jan 2009
01 Jan 2009

Development and Evaluation of Machine Learning Models for the Detection of Emergency Department Patients with Opioid Misuse from Clinical Notes.
Usman Shahid ... Neeraj Chhabra
medRxiv : the preprint server for health sciences | VOL. -
Usman Shahid, et. al.Usman Shahid ... Neeraj Chhabra
12 Dec 2024
medRxiv : the preprint server for health sciences | VOL. -

Using conditional random fields to predict focus word pair in spontaneous spoken English
Xiao Zang ... Lianhong Cai
-
Xiao Zang, et. al.Xiao Zang ... Lianhong Cai
14 Sep 2014
14 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics