Abstract
We approximate the version space which covers all feasible classification hypotheses into a structured geometric hypersphere against agnostic distribution. We present a structured perspective that divides the available active learning (AL) sampling approaches into two kinds of strategies: Outer Volume Sampling and Inner Volume Sampling. For the outer volume, it is represented by a circumscribed hypersphere which would exclude any outlier (non-promising) hypothesis from the version space globally. While for the inner volume, it is represented by many inscribed hyperspheres, which cover most of hypotheses within the outer volume. To enhance the performance of AL, we aggregate the two kinds of volumes to eliminate their sampling biases via finding the optimal inscribed hyperspheres in the enclosing space of outer volume. We then propose a Volume-based Model for the AL sampling without any distribution assumption. To generalize our theoretical model, in a non-linear feature space, spanned by kernel, we use sequential optimization to globally optimize the original space to a sparse space by halving the size of the kernel space. Then, the expectation maximization (EM) model which returns the local center helps us to find a local representation. To describe this process, we propose an easy-to-implement algorithm called Volume-based AL (VAL). Empirical evaluation on a various set of structured clustering and unstructured handwritten digit data sets have demonstrated that, employing our proposed model can accelerate the decline of the prediction error rate with fewer sampling number compared with the other algorithms.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.