A structured perspective of volumes on active learning

Xiaofeng Cao

doi:10.1016/j.neucom.2019.10.056

Abstract

We approximate the version space which covers all feasible classification hypotheses into a structured geometric hypersphere against agnostic distribution. We present a structured perspective that divides the available active learning (AL) sampling approaches into two kinds of strategies: Outer Volume Sampling and Inner Volume Sampling. For the outer volume, it is represented by a circumscribed hypersphere which would exclude any outlier (non-promising) hypothesis from the version space globally. While for the inner volume, it is represented by many inscribed hyperspheres, which cover most of hypotheses within the outer volume. To enhance the performance of AL, we aggregate the two kinds of volumes to eliminate their sampling biases via finding the optimal inscribed hyperspheres in the enclosing space of outer volume. We then propose a Volume-based Model for the AL sampling without any distribution assumption. To generalize our theoretical model, in a non-linear feature space, spanned by kernel, we use sequential optimization to globally optimize the original space to a sparse space by halving the size of the kernel space. Then, the expectation maximization (EM) model which returns the local center helps us to find a local representation. To describe this process, we propose an easy-to-implement algorithm called Volume-based AL (VAL). Empirical evaluation on a various set of structured clustering and unstructured handwritten digit data sets have demonstrated that, employing our proposed model can accelerate the decline of the prediction error rate with fewer sampling number compared with the other algorithms.

Full Text