Abstract

Preprocessing is more than half of machine learning process. Dimensionality reduction is one of the preprocessing task, which included feature extraction and selection. Feature selection used for identify relevant and remove not relevant feature. The goal of this research is to select relevant feature using wrapper method for early diabetes prediction dataset which has been transformed to numeric dataset previously. Forward and backward selection are used in wrapper method, that's combine with random forest and cross validation. Random forest is decision tree enhancement, which is group of trees that can produce difference or same result at each tree. The most results are made as final result. The final result from feature selection with wrapper method can make higher accuracy than without feature selection for numeric dataset and the number of feature can be reduced. With features selection which is sequential forward selection it has 98.84 % accuracy with 11 feature selected and with sequential backward selection, it has 99.03 % accuracy with same number of features selected. With reduced features, will reduces complexity of trees and time required in mining process.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call