Abstract

Classification is a supervised learning technique typically uses two-thirds of the given annotated data set for training and the remaining for test. In this paper, we developed a frame work which uses less than one-third of the data set for training and tests the remaining two-thirds of the data and still gives results comparable to other classifiers. To achieve good classification accuracy with small training sets, we focused on three issues: The first is that, one-third(30%) of the data should represent the entire data set. The second is on increasing the classification accuracy even with these small training sets, and the third issue is on taking care of deviations in the small training sets like noise or outliers. First issue is addressed by proposing three methods: divide the instances into 10 bins based on their distances from the centroid, based on their distance from a reference point 3/2(min+max) and a distribution specific binning. In all these methods, training sets are formed using stratified sampling approach which ensures that the samples chosen are from the entire distribution. Second issue is dealt with using the concept of ensemble based weighted majority voting for classification. Third issue is tackled by implementing four filters on training sets. The filters used are Removing Outliers using Inter Quartile Range option (available in Weka) and removing misclassified instances applying Naive Bayes, IB3, IB5 as filters. Experiments are conducted on seven binary andmulti-class data sets taking only 6% to 18% of the total data for training and implemented the proposed three methods without any filters for noise and outlier removal and with them too on the training sets.We compare our results with two popular ensemble methods ada-boost and bagging ensemble techniques, ENN, CNN, RNN instance selection methods. Empirical analysis shows that our three proposed methods yield comparable classification results to those available in literature which use small training sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call