Abstract

Semi-supervised learning (SSL) is the process of training decision functions using small amounts of labeled and relatively large amounts of unlabeled data. In many applications, annotating training data is time-consuming and error prone. Speech recognition is the typical example, which requires large amounts of meticulously annotated speech data (Evermann et al., 2005) to produce an accurate system. In the case of document classification for Internet search, it is not even feasible to accurately annotate a relatively large number of web pages for all categories of potential interest. SSL lends itself as a useful technique in many machine learning applications as one needs only to annotate relatively small amounts of the available data. SSL is related to the problem of transductive learning (Vapnik, 1998). In general, a learner is transductive if it is designed for prediction on only a closed data set, where the test set is revealed at training time. In practice, however, transductive learners can be modified to handle unseen data (Sindhwani et al., 2005; Zhu, 2005a). Chapter 25 in (Chapelle et al., 2007) gives a full discussion on the relationship between SSL and transductive learning. In this chapter, SSL refers to the semi-supervised transductive classification problem. Let x ∈ X denote the input to the decision function (classifier), f , and y ∈ Y denote its output label, i.e., f : X→ Y. In most cases f(x) = argmaxy∈Y p(y|x). In SSL, certain reasonable assumptions are made so that properties of the distribution p(x) (which is available from the unlabeled data sampled from p(x)) can influence p(y|x). These assumptions are as follows:

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.