Anemia, a prevalent hematologic disorder, necessitates accurate and timely diagnosis for effective management and treatment. This study explores the application of various machine learning models to classify anemia types using complete blood count (CBC) data. We evaluated multiple models, including DecisionTreeClassifier, ExtraTreeClassifier, RandomForestClassifier, ExtraTreesClassifier, XGBoost, LightGBM, and CatBoost, to identify the most effective approach for anemia diagnosis. The dataset comprised CBC data labeled with anemia diagnoses, sourced from multiple medical facilities. Rigorous data preprocessing was performed, followed by feature selection using methods such as Variance Inflation Factor (VIF), Predictive Power Score (PPS), and feature importance from ensemble models. The models were trained and evaluated using 5-fold cross-validation, with hyperparameter tuning conducted via GridSearchCV. Results demonstrated that the DecisionTreeClassifier achieved the highest balanced accuracy score of 94.17%, outperforming more complex ensemble methods. Confusion matrices validated its robust performance, highlighting its precision and recall. The study underscores the potential of simple decision tree models in medical diagnosis tasks, particularly when datasets are well-preprocessed. These findings have significant implications for clinical practice, suggesting that machine learning can enhance diagnostic accuracy and efficiency. Future work will explore advanced techniques to further improve performance and integration into clinical workflows.
Read full abstract