Abstract

The increasing amount and complexity of data used in predictive toxicology calls for efficient and effective feature selection methods in data pre-processing for data mining. In this paper, we propose a kNN model-based feature selection method (kNNMFS) aimed at overcoming the weaknesses of ReliefF method. It modifies the ReliefF method by: (1) using a kNN model as the starter selection aimed at choosing a set of more meaningful representatives to replace the original data for feature selection; (2) integration of the Heterogeneous Value Difference Metric to handle heterogeneous applications – those with both ordinal and nominal features; and (3) presenting a simple method of difference function calculation. The performance of kNNMFS was evaluated on a toxicity data set Phenols using a linear regression algorithm. Experimental results indicate that kNNMFS has a significant improvement in the classification accuracy for the trial data set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call