A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases

Christopher Kotfila,Özlem Uzuner

doi:10.1016/j.jbi.2015.07.016

A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases

Christopher Kotfila, Özlem Uzuner

Open Access

https://doi.org/10.1016/j.jbi.2015.07.016

Copy DOI

Abstract

Automated phenotype identification plays a critical role in cohort selection and bioinformatics data mining. Natural Language Processing (NLP)-informed classification techniques can robustly identify phenotypes in unstructured medical notes. In this paper, we systematically assess the effect of naive, lexically normalized, and semantic feature spaces on classifier performance for obesity, atherosclerotic cardiovascular disease (CAD), hyperlipidemia, hypertension, and diabetes. We train support vector machines (SVMs) using individual feature spaces as well as combinations of these feature spaces on two small training corpora (730 and 790 documents) and a combined (1520 documents) training corpus. We assess the importance of feature spaces and training data size on SVM model performance. We show that inclusion of semantically-informed features does not statistically improve performance for these models. The addition of training data has weak effects of mixed statistical significance across disease classes suggesting larger corpora are not necessary to achieve relatively high performance with these models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Biomedical Informatics	Publication Date: Aug 1, 2015
Citations: 24	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics

Lead the way for us

Similar Papers

A heuristic training for support vector regression
Wenjian Wang ... Zongben Xu
Neurocomputing | VOL. 61
Wenjian Wang, et. al.Wenjian Wang ... Zongben Xu
28 Jan 2004
Neurocomputing | VOL. 61

Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution?
Gaurav Nanda ... Mark Lehto
Accident Analysis and Prevention | VOL. 110
Gaurav Nanda, et. al.Gaurav Nanda ... Mark Lehto
08 Nov 2017
Accident Analysis and Prevention | VOL. 110

Variance based classifier comparison in text catergorization (poster session)
Atsuhiro Takasu ... Kenro Aihara
-
Atsuhiro Takasu, et. al.Atsuhiro Takasu ... Kenro Aihara
01 Jul 2000
01 Jul 2000

A Watermarking Scheme Based on SVM and Tolerable Position Map
Shwu-Huey Yen ... Yang-Ta Kao
-
Shwu-Huey Yen, et. al.Shwu-Huey Yen ... Yang-Ta Kao
01 Oct 2006
01 Oct 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases

Abstract

Talk to us

Similar Papers

More From: Journal of Biomedical Informatics