An effective correlation-based data modeling framework for automatic diabetes prediction using machine and deep learning techniques

Chaitanya Kumar Marpu,Kiran Kumar Patro,Jaya Prakash Allam,Pawel Plawiak,Nagwan Abdel Samee,Maali Alabdulhafith,Umamaheswararao Sanapala

doi:10.1186/s12859-023-05488-6

Abstract

The rising risk of diabetes, particularly in emerging countries, highlights the importance of early detection. Manual prediction can be a challenging task, leading to the need for automatic approaches. The major challenge with biomedical datasets is data scarcity. Biomedical data is often difficult to obtain in large quantities, which can limit the ability to train deep learning models effectively. Biomedical data can be noisy and inconsistent, which can make it difficult to train accurate models. To overcome the above-mentioned challenges, this work presents a new framework for data modeling that is based on correlation measures between features and can be used to process data effectively for predicting diabetes. The standard, publicly available Pima Indians Medical Diabetes (PIMA) dataset is utilized to verify the effectiveness of the proposed techniques. Experiments using the PIMA dataset showed that the proposed data modeling method improved the accuracy of machine learning models by an average of 9%, with deep convolutional neural network models achieving an accuracy of 96.13%. Overall, this study demonstrates the effectiveness of the proposed strategy in the early and reliable prediction of diabetes.

Full Text