Memory-Efficient, Accurate and Early Diagnosis of Diabetes Through a Machine Learning Pipeline Employing Crow Search-Based Feature Engineering and a Stacking Ensemble

Shirina Samreen

doi:10.1109/access.2021.3116383

Abstract

The early diagnosis of diabetes helps in avoiding the major risks associated with the disorder. The proposed research involves the design of a machine learning pipeline which generates the most representative feature subset of minimal size that predicts the onset of Diabetes with highest accuracy. It employs a novel diabetes dataset which is gender-neutral and representative enough unlike the well-known PID dataset. The machine learning pipelines involve multiple feature engineering pipelines to generate a reduced feature subset which is fed into multiple heterogeneous classifiers. The feature engineering involves feature selection as well as feature extraction. The former uses the ANOVA filter and Crow Search Optimization algorithm. The latter employs the Singular Value Decomposition. The classification is performed on the preprocessed dataset using a wide range of heterogeneous classifiers like Naive Bayes’, Logistic Regression, K-Nearest Neighbor, Decision Trees, Support Vector Machine, Random Forest, AdaBoost, and GradientBoost as base learners followed by their stacking ensemble. The performance evaluation of each machine learning pipeline is done through Repeated Stratified K-fold Cross Validation using the metrics of accuracy, precision, recall, F1 Score and area under Receiver Operating Characteristic curve. For each pipeline, the number of features in the preprocessed dataset varies and the highest accuracy of 98.4% is achieved with Crow Search algorithm through a stacking ensemble of multiple heterogeneous classifiers. A comparative analysis with a recent related work on the same dataset shows that the proposed feature engineering pipelines with the same set of classifiers outperform with improved accuracy using a feature set of reduced size.

Highlights

A very common chronic disorder prevalent in the modern world is Diabetes Mellitus
The results of various experiments employing the different classifiers with proposed feature engineering pipelines(FEP) is described in multiple subsections
WORK In the current research, a novel diabetes dataset from the UCI repository is employed rather than the benchmark Pima Indian Diabetes (PID) dataset

Summary

Introduction

A very common chronic disorder prevalent in the modern world is Diabetes Mellitus. It has become a serious health issue throughout the world irrespective of geographic boundaries. The disorder is associated with the insulin hormone produced by the pancreas. It occurs in one of the following forms: Type 1, Type 2 and Gestational diabetes [1]. Type 1 diabetes is caused when the body’s immune system causes the destruction of beta cells of the pancreas. The body has deficient insulin which makes the glucose absorption in

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Memory-Efficient, Accurate and Early Diagnosis of Diabetes Through a Machine Learning Pipeline Employing Crow Search-Based Feature Engineering and a Stacking Ensemble

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Transformation based Diabetes Classification using Crow-Search Optimization Algorithm
Harikumar Rajaguru ... Sannasi Chakravarthy S R
-
Harikumar Rajaguru, et. al.Harikumar Rajaguru ... Sannasi Chakravarthy S R
09 Oct 2021
09 Oct 2021

Energy-Efficient Hybrid Firefly–Crow Optimization Algorithm for VM Consolidation
Nimmol P John ... V R Bindu
-
Nimmol P John, et. al.Nimmol P John ... V R Bindu
01 Jan 2020
01 Jan 2020

Feature selection strategy based on hybrid crow search optimization algorithm integrated with chaos theory and fuzzy c-means algorithm for medical diagnosis problems
Ahmed M Anter ... Mumtaz Ali
Soft Computing | VOL. 24
Ahmed M Anter, et. al.Ahmed M Anter ... Mumtaz Ali
20 Apr 2019
Soft Computing | VOL. 24

A Hybrid Crow Search and Grey Wolf Optimization Technique for Enhanced Medical Data Classification in Diabetes Diagnosis System
C Mallika ... S Selvamuthukumaran
International Journal of Computational Intelligence Systems | VOL. 14
C Mallika, et. al.C Mallika ... S Selvamuthukumaran
01 Sep 2021
International Journal of Computational Intelligence Systems | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Memory-Efficient, Accurate and Early Diagnosis of Diabetes Through a Machine Learning Pipeline Employing Crow Search-Based Feature Engineering and a Stacking Ensemble

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access