Empirical Study of the Universum SVM Learning for High-Dimensional Data

Vladimir Cherkassky,Wuyang Dai

doi:10.1007/978-3-642-04274-4_96

Abstract

Many applications of machine learning involve sparse high-dimensional data, where the number of input features is (much) larger than the number of data samples, d ≫ n. Predictive modeling of such data is very ill-posed and prone to overfitting. Several recent studies for modeling high-dimensional data employ new learning methodology called Learning through Contradictions or Universum Learning due to Vapnik (1998,2006). This method incorporates a priori knowledge about application data, in the form of additional Universum samples, into the learning process. This paper investigates generalization properties of the Universum-SVM and how they are related to characteristics of the data. We describe practical conditions for evaluating the effectiveness of Random Averaging Universum.KeywordsTraining DataUnivariate ProjectionLabel Training DataRandom AverageNormal Direction VectorThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text