Background: Many of the reports on HIV/AIDS shows that the number of ART registered patients are increasing from time to time. Despite those reports show increasing of patients’ number, they did not try to make prediction of attributes based on the given attributes more than statistical explanation. This study concerned to use data mining techniques on ART data base. The main objective of the study is to apply data mining techniques for predicting CD4 status of patients on ART in Jimma and Bonga Hospitals. Methodology: The study followed the CRISP-DM data mining methodology which has six phases: business understanding, data understanding, data preparation, model building, evaluation and deployment. For this study, data was taken from two hospitals of the south west of Ethiopia; Jimma and Bonga hospitals. Classification algorithm was used to predict CD4 status of the patients those who are following ART therapy. J48 is a technique used for building classification and PART is used to compare the result of J48 algorithm. Results: The best performance achieved by J48 decision tree algorithm is a generalized decision tree pruning with reduced attributes. The model classifies instances correctly (88.79%) and incorrectly (11.21%). The weighted average precision of the model is 0.88 with recall of 0.89 and ROC area of 0.85. The model has 760 numbers of leaves and 916 tree size. The time taken to build the model is 0.05 seconds. The analysis of this model shows that the model is quit efficient to predict CD4 status of patients those who are following ART. Conclusion: Classification done using J48 decision tree is the best model as compared to PART rule algorithm and that can be used for prediction. From the model built it is possible to conclude that attributes like: Eligible reason, ART status, ART start year, OA weight, OAWHO stage, Current regimen, Family planning, Functional status, Marital status, Past ARV are the most determining factors of CD4 status.
Read full abstract