Risk difference is a valuable measure of association in epidemiology and healthcare which has the potential to be used in medical and clinical variable selection. In this study, an attribute ranking algorithm, called AttributeRank, was developed to facilitate variable selection from clinical data sets. The algorithm computes the risk difference between a predictor and the response variable to determine the level of importance of a predictor. The performance of the algorithm was compared with some existing variable selection algorithms using five clinical data sets on neonatal birthweight, bacterial survival after treatment, myocardial infarction, breast cancer, and diabetes. The variable subsets selected by AttributeRank yielded the highest average classification accuracy across the data sets, compared to Fisher score, Pearson's correlation, variable importance function, and Chi-Square. AttributeRank proved to be more valuable in attribute ranking of clinical data sets compared to the existing algorithms and should be implemented in a user-friendly application in future research.
Read full abstract