Abstract

Positive and unlabeled (PU) learning is an important research topic in machine learning area, whose aim is to learn a good classifier from PU data. Due to its wide applications, a variety of PU algorithms with promising performance have been proposed. However, few of them consider PU learning from high-dimensional PU data. To fill the gap, in this work, we focus on designing the sparse PU classifier. However, it is difficult to achieve it, since the labels of U samples are uncertain. To this end, a loss matrix-based alternating optimization method, named LMAO-PU is proposed, where a two-stage alternating optimization idea is suggested to solve the difficulty in sparse PU learning. Firstly, a loss matrix is designed to measure the classification performance of the PU classifier on U samples. Then, a two-stage alternating optimization under the guide of loss matrix is developed. Specifically, the first stage is to optimize the PU model, aiming to address the challenge of labels uncertainty of U samples, where two objectives, TPR (True Positive Rate) and a suggested LRU (Loss Rate on Unlabeled), are optimized simultaneously. The second stage is to perform the sparse model optimization, where Sparsity and ErrorRate are adopted as the objectives to obtain the sparse model. The two-stage optimization procedure mentioned above is carried alternately and the sparse PU classifier with high quality is finally achieved. Experiments on 10 high-dimensional datasets demonstrate the superiority of proposed method over six state-of-the-art baselines in terms of sparsity, accuracy and area under ROC curve.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call