Abstract

We investigate the relative value of labeled and unlabeled samples in constructing classification rules. We observe a training set Q composed of l labeled and u unlabeled samples coming from two classes. Let sample from class 1 be distributed according to f/sub 1/(/spl middot/), samples from class 2 according to f/sub 2/(/spl middot/), and let /spl eta/ be the probability that a sample is in class 1. Assume that f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are known and that /spl eta/ is unknown. We want to classify a new sample X/sub 0/. The relative value of labeled and unlabeled observations in reducing the probability of error is equal to I/sub t/(/spl eta/)/I/sub u/(/spl eta/), the ratio of the Fisher information of the labeled and unlabeled samples. Moreover labeled samples are not necessary in order to construct a decision rule. However, if f/sub 1/(/spl middot/) and f/sub 2/(/spl middot/) are given, but it is not known whether observations from class 1 are distributed according to f/sub 1/(/spl middot/) or according to f/sub 2/(/spl middot/), then the labeled samples are necessary and exponentially more valuable than unlabeled samples. >

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call