Abstract

This work investigates the performance of different machine learning (ML) methods for classifying postmenopausal osteoporosis Thai patients. Our dataset contains 377 samples compiled retrospectively using the medical records of a Thai woman in the postmenopause stage from the obstetrics and gynecology clinic, Ramathibodi Hospital, Bangkok, Thailand. Missing data imputation, feature selection, and handling imbalanced techniques are independently applied as pre-processing approaches. The performance of different ML algorithms, including k-nearest neighbors (k-NN), neural network (NN), naïve Bayesian (NB), Bayesian network (BN), support vector machine (SVM), random forest (RF), and decision tree (DT), is compared between the pre-processed and original data. The results demonstrate that different ML algorithms combined with pre-processing techniques achieve varying results. In terms of accuracy, the three best-performing methods are the NN, NB, and RF models when a wrapper approach is used with an appropriate learner. In terms of specificity, the DT model achieves the best performance when the synthetic minority oversampling technique method is applied. When feature selection techniques are applied, the k-NN, BN, and SVM algorithms obtain the best sensitivity, whereas the NN shows the best area under the curve. Overall, in comparison with the original dataset, the pre-processed approaches improved model performance. Therefore, proper pre-processing techniques should be considered when developing ML classifiers to identify the best appropriate model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call