Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

Md Ahsan,M Mahmud,Kishor Gupta,Pritom Saha,Zahed Siddique

doi:10.3390/technologies9030052

Md Ahsan, M Mahmud + Show 3 more

Open Access

https://doi.org/10.3390/technologies9030052

Copy DOI

Journal: Technologies	Publication Date: Jul 24, 2021
Citations: 239	License type: CC BY 4.0

Affiliation: University of Oklahoma, University of Memphis

Abstract

Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.

Highlights

This study evaluated eleven machine learning (ML) algorithms along with six distinct data scaling methods to detect patients with heart diseases using the UCI heart disease dataset
Our findings suggest that data scaling approaches have some effect on ML predictions
The Classification and Regression Trees (CART) algorithm achieved almost 100% accuracy and outperformed any other method proposed by previous literature for heart disease prediction

Summary

Introduction

Patients with heart disease symptoms often require electrocardiography and blood tests in order to evaluate the disease appropriately [1,2]. Almost 12 million people die due to heart diseases [3]. The diagnosis of this disease is vital at an early stage. An automated diagnosis system would be beneficial that could be operated by nonmedical people as well. It was observed that diagnosing heart disease with additional patient information and medical history at an early stage can save time, money, and health as well [4]. Several studies have shown the possibility of developing a decision support system using that information with the help of machine learning approaches [2,5,6,7,8,9]

Objectives

Methods

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Technologies

Lead the way for us

Similar Papers

Value of feature reduction for crop differentiation using multi-temporal imagery, machine learning, and object-based image analysis
J.K Gilbertson ... A Van Niekerk
-
J.K Gilbertson, et. al.J.K Gilbertson ... A Van Niekerk
01 Jan 2015
01 Jan 2015

Outlier Detection Using Machine Learning Algorithms Integrated with Bayesian Optimization
Xihua Liu ... Wenjie Gao
-
Xihua Liu, et. al.Xihua Liu ... Wenjie Gao
01 Sep 2022
01 Sep 2022

Understanding the Performance of Machine Learning Models to Predict Credit Default: A Novel Approach for Supervisory Evaluation
Andrés Alonso ... Jose Manuel Carbo
SSRN Electronic Journal | VOL. -
Andrés Alonso, et. al.Andrés Alonso ... Jose Manuel Carbo
27 Jan 2021
SSRN Electronic Journal | VOL. -

Head-cut gully erosion susceptibility mapping in semi-arid region using machine learning methods: insight from the high atlas, Morocco
Abdeslam Baiddah ... Abdellah Khouz
Frontiers in Earth Science | VOL. 11
Abdeslam Baiddah, et. al.Abdeslam Baiddah ... Abdellah Khouz
30 May 2023
Frontiers in Earth Science | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Technologies