Abstract

We develop a fully-automated pattern search methodology for model selection of support vector machines (SVMs) for regression and classification. Pattern search (PS) is a derivative-free optimization method suitable for low-dimensional optimization problems for which it is difficult or impossible to calculate derivatives. This methodology was motivated by an application in drug design in which regression models are constructed based on a few high-dimensional exemplars. Automatic model selection in such underdetermined problems is essential to avoid overfitting and overestimates of generalization capability caused by selecting parameters based on testing results. We focus on SVM model selection for regression based on leave-one-out (LOO) and cross-validated estimates of mean squared error, but the search strategy is applicable to any model criterion. Because the resulting error surface produces an extremely noisy map of the model quality with many local minima, the resulting generalization capacity of any single local optimal model illustrates high variance. Thus several locally optimal SVM models are generated and then bagged or averaged to produce the final SVM. This strategy of pattern search combined with model averaging has proven to be very effective on benchmark tests and in high-variance drug design domains with high potential of overfitting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call