This paper intends to find a more cost-effective way for training oil spill classification systems by introducing active learning (AL) and exploring its potential, so that satisfying classifiers could be learned with reduced number of labeled samples. The dataset used has 143 oil spills and 124 look-alikes from 198 RADARSAT images covering the east and west coasts of Canada from 2004 to 2013. Six uncertainty-based active sample selecting (ACS) methods are designed to choose the most informative samples. A method for reducing information redundancy amongst the selected samples and a method with varying sample preference are considered. Four classifiers (k-nearest neighbor (KNN), support vector machine (SVM), linear discriminant analysis (LDA) and decision tree (DT)) are coupled with ACS methods to explore the interaction and possible preference between classifiers and ACS methods. Three kinds of measures are adopted to highlight different aspect of classification performance of these AL-boosted classifiers. Overall, AL proves its strong potential with 4% to 78% reduction on training samples in different settings. The SVM classifier shows to be the best one for using in the AL frame, with perfect performance evolving curves in different kinds of measures. The exploration and exploitation criterion can further improve the performance of the AL-boosted SVM classifier but not of the other classifiers.
Read full abstract