Two-stage support vector machines to protein relative solvent accessibility prediction

M.N Nguyen,J.C Rajapakse

doi:10.1109/cibcb.2004.1393934

Abstract

Bioinformatics techniques to relative solvent accessibility (RSA) prediction are mostly single-stage approaches; they predict solvent accessibility of proteins by taking into account only the information available in amino acid sequences. We propose to use support vector machines (SVMs) as a second stage following the existing single-stage approaches for RSA prediction problem to improve the accuracy. The purpose of the second stage is to capture the contextual relationship of solvent accessibility elements in a neighborhood in determining the solvent accessibility at a particular site. We demonstrate our approach by introducing SVMs to the output of single-stage SVM classifier. Two-stage SVM approach achieves accuracies up to 90.4% and 90.2% on the Manesh dataset of 215 protein structures and the RS126 dataset of 126 nonhomologous globular proteins, respectively, which are better than the highest reported scores on both datasets to date.

Full Text