A semi-supervised learning approach for model selection based on class-hypothesis testing

Juan M Gorriz,Javier Ramirez,C.G Puntonet,F Segovia,D Castillo-Barnés,A Ortiz,F.J Martinez-Murcia,I.A Illán,John Suckling,D Salas-González

doi:10.1016/j.eswa.2017.08.006

Abstract

Abstract This paper deals with the topic of learning from unlabeled or noisy-labeled data in the context of a classification problem. In the classification problem the outcome yields one of a discrete set of values thus, assumptions on them could be established to obtain the most likely prediction model at the training stage . In this paper, a novel case-based model selection method is proposed, which combines hypothesis testing from a discrete set of expected outcomes and feature extraction within a cross-validated classification stage. This wrapper-type procedure acts on fully-observable variables under hypothesis-testing and improves the classification accuracy on the test set, or keeps its performance at least at the level of the statistical classifier. The model selection strategy in the cross validation loop allows building an ensemble classifier that could improve the performance of any expert and intelligence system, particularly on small sample-size datasets. Experiments were carried out on several databases yielding a clear improvement on the baseline, i.e., SPECT dataset A c c = 86.35 ± 1.51 , with S e n = 91.10 ± 2.77 , and S p e = 81.11 ± 1.61 . In addition, the CV error estimate for the classifier under our approach was found to be an almost unbiased estimate (as the baseline approach) of the true error that the classifier would incur on independent data.

Full Text