Abstract

We observe a training set Q composed of l labeled samples {(X/sub 1/,/spl theta//sub 1/),...,(X/sub l/, /spl theta//sub l/)} and u unlabeled samples {X/sub 1/',...,X/sub u/'}. The labels /spl theta//sub i/ are independent random variables satisfying Pr{/spl theta//sub i/=1}=/spl eta/, Pr{/spl theta//sub i/=2}=1-/spl eta/. The labeled observations X/sub i/ are independently distributed with conditional density f/sub /spl theta/i/(/spl middot/) given /spl theta//sub i/. Let (X/sub 0/,/spl theta//sub 0/) be a new sample, independently distributed as the samples in the training set. We observe X/sub 0/ and we wish to infer the classification /spl theta//sub 0/. In this paper we first assume that the distributions f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are given and that the mixing parameter is unknown. We show that the relative value of labeled and unlabeled samples in reducing the risk of optimal classifiers is the ratio of the Fisher informations they carry about the parameter /spl eta/. We then assume that two densities g/sub 1/(/spl middot/) and g/sub 2/(/spl middot/) are given, but we do not know whether g/sub 1/(/spl middot/)=f/sub 1/(/spl middot/) and g/sub 2/(/spl middot/)=f/sub 2/(/spl middot/) or if the opposite holds, nor do we know /spl eta/. Thus the learning problem consists of both estimating the optimum partition of the observation space and assigning the classifications to the decision regions. Here, we show that labeled samples are necessary to construct a classification rule and that they are exponentially more valuable than unlabeled samples.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.