Abstract

This paper addresses the effect of data reduction on speeding up training while keeping or improving the accuracy performance of classification. Since many studies have focused on feature selection, but did not adequately consider instance selection, our work focuses on both instance reduction and feature reduction, integrated into a whole reduction method. We examined in prior work Simple Random Sample Selection without Replacement, integrated with the Information Gain-based Feature Selection method, and compared its performance with the unintegrated instance selection and feature selection alone, using a single reduction rate. Our results proved that the integration of instance and feature selection performed much better than instance or feature selection alone. In this paper, we examine our approach in more depth, trying different reduction rates and different distributions of reduction rates between instance reduction and feature reduction. Our results show that for nearly all common classifiers, our integrated data reduction speeds up training significantly while keeping the accuracy unchanged (and sometimes even improved) at even high reduction rates. We also present the optimal feature-instance reduction-rates tradeoff.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call