Graph-based semi-supervised learning (GSSL) has received much attention recently. Despite some progress made in this area by some recent methods, some limitations still need to be addressed. Namely, there are two main shortcomings. First, the graphs used are very often built in advance and independently of the task at hand, using a heuristic, and generally do not represent the true topology of the data. The second shortcoming is the ability of the model to handle a very large number of unlabeled samples. This can make the GSSL solution impractical from a computational resource perspective. In this paper, we propose the Weighted Simultaneous Graph Construction and Reduced Flexible Manifold Embedding (W-SGRFME) method, which is a scalable and inductive GSSL framework. The main contributions are as follows. First, we extend the concept of graph topology imbalance to large datasets. Second, we integrate the computed weights of the labeled samples into the unified semi-supervised model. The latter jointly estimates the labels of the unlabeled samples, the mapping of the feature space to the label space, and the graph matrix of the anchor graph. Moreover, the fusion of labels and features of anchors is used to adaptively construct the graph. Experimental results on three large datasets from semi-supervised learning show the effectiveness of the proposed scalable method. These datasets are NORB, RCV1, and Covtype. Experimental results on large datasets show the superiority of the proposed method over existing scalable models.
Read full abstract