Abstract

The practical importance of the prediction of structural domains in un-annotated amino acid sequences has increased as they represent valuable targets readily characterized by high throughput methods. Here we report a support vector machine (SVM) prediction of domain linkers, which are loop regions separating two structural domains. The SVM training data set comprised 182 protein sequences from SCOP database, which contained at least one domain linker regions (all). Furthermore, the data set was divided into long (longer than 9 residues) and short (shorter than or equal to 9 residues) linker sequences. Using these data sets, we constructed three loop length dependent SVMs (SVM-All, SVM-Long, and SVM-Short), which were trained using all, long and short linkers, respectively. In addition, our new SVM input data used a position specific scoring matrix (PSSM) and predicted secondary structure information (PSS). A five-fold cross validation test indicates that the area under the ROC (receiver operating characteristics) curve (AUC) value, which represents the prediction performance, of SVM-All, SVM-Long and SVM-Short were 0.763, 0.759 and 0.759, respectively. Our previous SVMs, which used only amino acid sequence information, indicated prediction performances of 0.692, 0.702, and 0.605, for SVM-All, SVM-Long and SVM-Short respectively. The prediction performances of our new predictors thus were over 10% higher than those of our previous methods, Armadillo ( 1) (AUC value: 0.610 Dumontier et. al. J Mol Biol 2005), and neural network based method (2)(AUC value: 0.642 Miyazaki et. al. BMC Bioinformatics 2006). These results demonstrate the efficiency of our new methods. Thus, the performance has been improved by the inclusion of PSSM and PSS, in addition to sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call