Abstract
Support vector machine (SVM) is a popular classification method for analysis of high dimensional data such as genomics data. Recently a number of linear SVM methods have been developed to achieve feature selection through either frequentist regularization or Bayesian shrinkage, but the linear assumption may not be plausible for many real applications. In addition, recent work has demonstrated that incorporating known biological knowledge, such as those from functional genomics, into the statistical analysis of genomic data offers great promise of improved predictive accuracy and feature selection. Such biological knowledge can often be represented by graphs. In this article, we propose a novel knowledge-guided nonlinear Bayesian SVM approach for analysis of high-dimensional data. Our model uses graph information that represents the relationship among the features to guide feature selection. To achieve knowledge-guided feature selection, we assign an Ising prior to the indicators representing inclusion/exclusion of the features in the model. An efficient MCMC algorithm is developed for posterior inference. The performance of our method is evaluated and compared with several penalized linear SVM and the standard kernel SVM method in terms of prediction and feature selection in extensive simulation studies. Also, analyses of genomic data from a cancer study show that our method yields a more accurate prediction model for patient survival and reveals biologically more meaningful results than the existing methods.
Accepted Version
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have