By combining the structural information with nonparallel support vector machine, structural nonparallel support vector machine (SNPSVM) can fully exploit prior knowledge to directly improve the algorithm’s generalization capacity. However, the scalability issue how to train SNPSVM efficiently on data with huge dimensions has not been studied. In this paper, we integrate linear SNPSVM with b-bit minwise hashing scheme to speedup the training phase for large-scale and high-dimensional statistical learning, and then we address the problem of speeding-up its prediction phase via locality-sensitive hashing. For one-against-one multi-class classification problems, a two-stage strategy is put forward: a series of hash-based classifiers are built in order to approximate the exact results and filter the hypothesis space in the first stage and then the classification can be refined by solving a multi-class SNPSVM on the remaining classes in the second stage. The proposed method can deal with large-scale classification problems with a huge number of features. Experimental results on two large-scale datasets (i.e., news20 and webspam) demonstrate the efficiency of structural learning via b-bit minwise hashing. Experimental results on the ImageNet-BOF dataset, and several large-scale UCI datasets show that the proposed hash-based prediction can be more than two orders of magnitude faster than the exact classifier with minor losses in quality.
Read full abstract