Abstract

Feature selection is an important preprocessing technique used to determine the most important features that contributes to the classification of a dataset, typically performed on high dimension datasets. Various feature selection algorithms have been proposed for diabetes prediction. However, the effectiveness of these proposed algorithms have not been thoroughly evaluated statistically. In this paper, three types of feature selection methods (Sequential Forward Selection, Sequential Backward Selection and Recursive Feature Elimination) classified under the wrapper method are used in identifying the optimal subset of features needed for classification of the Pima Indians Diabetes dataset with an Artificial Neural Network (ANN) as the classifying algorithm. All three methods manage to identify the important features of the dataset (Plasma Glucose Concentration and BMI reading), indicating their effectiveness for feature selection, with Sequential Forward Selection obtaining the feature subset that most improves the ANN. However, there are little to no improvements in terms of classifier evaluation metrics (accuracy and precision) when trained using the optimal subsets from each method as compared to using the original dataset, showing the ineffectiveness of feature selection on the low-dimensional Pima Indians Diabetes dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call