Comparative Analysis of Machine Learning Models for Fitness Level Prediction with Imbalanced Dataset

Stephanie Chua,Chia Inn Sii,Puteri Nor Ellyza Nohuddin

doi:10.1109/icdi57181.2022.10007339

Stephanie Chua, Chia Inn Sii + Show 1 more

https://doi.org/10.1109/icdi57181.2022.10007339

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

An individual's fitness level is usually synonymous to his or her health status. While some people may quite often feel healthy with no health issues, others may suffer from a myriad of health-related problems. In general, people will seek the advice of health professionals as the last resort, when their health issues surface. In this research, we aim to use fitness-related data to predict an individual's fitness level using the machine learning approach. However, with an imbalanced dataset, we first use Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset. Six machine learning techniques; Naïve Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbour (KNN), Decision Tree (DT) and Random Forest (RF) were then used to build models for predicting fitness levels of “Fit”, “Average” and “Unfit”. A comparative study using ten- fold cross validation was then conducted to determine the best model, using both the imbalanced and balanced datasets. Experimental results showed that Random Forest performed the best in predicting the fitness level of an individual with an accuracy and F1-score of over 90% when using the balanced dataset. This showed the potential of using this model for an application to predict one's fitness level.

Full Text