A machine learning-based framework to identify type 2 diabetes through electronic health records

Tao Zheng,Wei Xie,Liling Xu,Xiaoying He,Ya Zhang,Mingrong You,Gong Yang,You Chen

doi:10.1016/j.ijmedinf.2016.09.014

Abstract

ObjectiveTo discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate. Materials and methodsWe propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014. ResultsWe apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC). DiscussionExpert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature engineering to loosen such selection criteria to achieve a high identification rate of cases and controls. ConclusionsOur proposed framework demonstrates a more accurate and efficient approach for identifying subjects with and without T2DM from EHR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A machine learning-based framework to identify type 2 diabetes through electronic health records

Abstract

Talk to us

Similar Papers

More From: International Journal of Medical Informatics

Lead the way for us

Journal: International Journal of Medical Informatics	Publication Date: Oct 1, 2016
Citations: 288

Similar Papers

A Big Data Application of Machine Learning-Based Framework to Identify Type 2 Diabetes Through Electronic Health Records
Tao Zheng ... Ya Zhang
-
Tao Zheng, et. al.Tao Zheng ... Ya Zhang
01 Jan 2017
01 Jan 2017

Prevalence of GCKR rs1260326 Variant in Subjects with Obesity Associated NAFLD and T2DM: A Case-Control Study in South Punjab, Pakistan.
Tayyaba Nisar ... Maira Ali Khan
Journal of Obesity | VOL. 2023
Tayyaba Nisar, et. al.Tayyaba Nisar ... Maira Ali Khan
04 Oct 2023
Journal of Obesity | VOL. 2023

Analysıs of Urınary Albumın and Urınary Synaptopodın Levels in Type 2 Dıabetes Mellıtus Subjects
Afni Juhairia Laisouw ... Liong Boy Kurniawan
Medical Laboratory Technology Journal | VOL. -
Afni Juhairia Laisouw, et. al.Afni Juhairia Laisouw ... Liong Boy Kurniawan
04 May 2023
Medical Laboratory Technology Journal | VOL. -

Unaltered Angiogenesis-Regulating Activities of Platelets in Mild Type 2 Diabetes Mellitus despite a Marked Platelet Hyperreactivity.
Xinyan Miao ... Zhangsen Huang
PLOS ONE | VOL. 11
Xinyan Miao, et. al.Xinyan Miao ... Zhangsen Huang
09 Sep 2016
PLOS ONE | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A machine learning-based framework to identify type 2 diabetes through electronic health records

Abstract

Talk to us

Similar Papers

More From: International Journal of Medical Informatics