Abstract

Hypothyroidism (hypo means “under” or “below normal”) is the most common thyroid disease. It affects people worldwide irrespective of sex, age, race, and level of education, and wealth. The hypothyroid disease diagnosis in the early stage will help take precautions in preventing future complications. This work focuses on implementing a Machine Learning (ML)-based real-time hypothyroid disease classification framework. The hypothyroid dataset from UCI was employed to build the ML models. The K-Nearest Neighbor (KNN), Random Forest (RF), and Decision Tree (DT) ML models were implemented and achieved an accuracy of 98% for all models. Later the Min-Max Scaler data normalization technique was applied to improve the performance and achieved average accuracy of 99% for both RF and DT models. The hypothyroid dataset has imbalanced data. In such a case, F1-score is an important metric to evaluate the model performance. K-fold cross validation technique is applied to cross-validate the models. The average F1-score was found to be 0.94 for the RF model, whereas 0.92 for DT and 0.9 for KNN. This performance indicates RF model is performing well for predicting hypothyroidism and also outperforms the models reported in the literature.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call