Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

Manal Alghamdi,Jonathan Ehrman,Clinton Brawner,Mouaz Al-Mallah,Sherif Sakr,Steven Keteyian,Bin Liu

doi:10.1371/journal.pone.0179805

Abstract

Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

Highlights

Over the last century, the prevalence of diabetes has been increasing dramatically with the aging population worldwide
The aim of this study is to take advantage of the unique opportunity provided by our access to a large and rich clinical research dataset collected by the The Henry Ford ExercIse Tesing (FIT) project [13] and using it to investigate the relative performance of various machine learning classification methods such as Decision Tree (DT), Naïve Bayes (NB), Logistic Regression (LR), Logistic Model Tree (LMT) and Random Forests (RF) for predicting incident diabetes using medical records of cardiorespiratory fitness
The results show that the Logistic Regression (LR) classifier achieves the highest performance (69.1% for G1 and 68.9% for G2) while the J48 Decision Tree (DT) classifier achieves the lowest performance (63.2% for G1 and 64.5% for G2)

Summary

Objectives

The aim of this study is to take advantage of the unique opportunity provided by our access to a large and rich clinical research dataset collected by the The Henry Ford ExercIse Tesing (FIT) project [13] and using it to investigate the relative performance of various machine learning classification methods such as Decision Tree (DT), Naïve Bayes (NB), Logistic Regression (LR), Logistic Model Tree (LMT) and Random Forests (RF) for predicting incident diabetes using medical records of cardiorespiratory fitness

Methods

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Jul 24, 2017
Citations: 209	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project
Sherif Sakr ... Mouaz H Al-Mallah
BMC Medical Informatics and Decision Making | VOL. 17
Sherif Sakr, et. al.Sherif Sakr ... Mouaz H Al-Mallah
01 Dec 2017
BMC Medical Informatics and Decision Making | VOL. 17

A MET a Day Keeps Arrhythmia at Bay: The Association Between Exercise or Cardiorespiratory Fitness and Atrial Fibrillation
Suraj Kapa ... Samuel J Asirvatham
Mayo Clinic Proceedings | VOL. 91
Suraj Kapa, et. al.Suraj Kapa ... Samuel J Asirvatham
08 Apr 2016
Mayo Clinic Proceedings | VOL. 91

Corporate Bankruptcy Prediction: An Approach Towards Better Corporate World
Talha Mahboob Alam ... Mubbashar Mushtaq
The Computer Journal | VOL. 64
Talha Mahboob Alam, et. al.Talha Mahboob Alam ... Mubbashar Mushtaq
17 Jun 2020
The Computer Journal | VOL. 64

Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles
Amiratul Diyana Amirruddin ... Mohd Hasmadi Ismail
Computers and Electronics in Agriculture | VOL. 193
Amiratul Diyana Amirruddin, et. al.Amiratul Diyana Amirruddin ... Mohd Hasmadi Ismail
03 Jan 2022
Computers and Electronics in Agriculture | VOL. 193

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE