Towards Making Unlabeled Data Never Hurt.

Yu-Feng Li Yu-Feng Li,Zhi-Hua Zhou Zhi-Hua Zhou

doi:10.1109/tpami.2014.2299812

Abstract

It is usually expected that learning performance can be improved by exploiting unlabeled data, particularly when the number of labeled data is limited. However, it has been reported that, in some cases existing semi-supervised learning approaches perform even worse than supervised ones which only use labeled data. For this reason, it is desirable to develop safe semi-supervised learning approaches that will not significantly reduce learning performance when unlabeled data are used. This paper focuses on improving the safeness of semi-supervised support vector machines (S3VMs). First, the S3VM-us approach is proposed. It employs a conservative strategy and uses only the unlabeled instances that are very likely to be helpful, while avoiding the use of highly risky ones. This approach improves safeness but its performance improvement using unlabeled data is often much smaller than S3VMs. In order to develop a safe and well-performing approach, we examine the fundamental assumption of S3VMs, i.e., low-density separation. Based on the observation that multiple good candidate low-density separators may be identified from training data, safe semi-supervised support vector machines (S4VMs) are here proposed. This approach uses multiple low-density separators to approximate the ground-truth decision boundary and maximizes the improvement in performance of inductive SVMs for any candidate separator. Under the assumption employed by S3VMs, it is here shown that S4VMs are provably safe and that the performance improvement using unlabeled data can be maximized. An out-of-sample extension of S4VMs is also presented. This extension allows S4VMs to make predictions on unseen instances. Our empirical study on a broad range of data shows that the overall performance of S4VMs is highly competitive with S3VMs, whereas in contrast to S3VMs which hurt performance significantly in many cases, S4VMs rarely perform worse than inductive SVMs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Making Unlabeled Data Never Hurt.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence

Lead the way for us

Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence	Publication Date: Jan 1, 2015
Citations: 282

Similar Papers

Co-Tracking Using Semi-Supervised Support Vector Machines
Feng Tang ... Qi Zhao
-
Feng Tang, et. al.Feng Tang ... Qi Zhao
01 Jan 2007
01 Jan 2007

On semi-supervised linear regression in covariate shift problems
...
Journal of Machine Learning Research | VOL. 16
, et. al. ...
01 Jan 2015
Journal of Machine Learning Research | VOL. 16

Modified criterion to select useful unlabeled data for improving semi-supervised support vector machines
Thanh-Binh Le ... Sang-Woon Kim
Pattern Recognition Letters | VOL. 60-61
Thanh-Binh Le, et. al.Thanh-Binh Le ... Sang-Woon Kim
04 May 2015
Pattern Recognition Letters | VOL. 60-61

An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.
Ana Stanescu ... Doina Caragea
BMC Systems Biology | VOL. Suppl 9 5
Ana Stanescu, et. al.Ana Stanescu ... Doina Caragea
01 Jan 2015
BMC Systems Biology | VOL. Suppl 9 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Making Unlabeled Data Never Hurt.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Pattern Analysis and Machine Intelligence