Abstract

Instance selection plays a critical role in enhancing the efficacy and efficiency of machine learning tools when utilised for a data mining task. This study proposes a fixed instance selection algorithm based on simultaneous perturbation stochastic approximation that works in conjunction with any supervised machine learning method and any corresponding performance metric, which we call SpFixedIS. This algorithm provides an approximate solution to the NP-hard instance selection problem and additionally serves as a way of intelligently selecting a specified number of instances within a training set with regards to a machine learning model. The shape of the objective function obtained from the test accuracy against the number of instances selected is examined extensively for our instance selection algorithm. The SpFixedIS algorithm was tested on 43 diverse datasets across 6 different machine learning classifiers. The results show that in over 90% of cases SpFixedIS provides a statistically significant improvement at a 5% level with intelligent selection over random selection for the same number of instances. Furthermore, with respect to probabilistic models, specifically Gaussian Naive Bayes, SpFixedIS provides a statistically significant improvement compared to models that utilise the entirety of the training set in 84% of the experimented values ranging from 50 to 1000 instances.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call