Abstract

Prediction of relative solvent accessibility (RSA) is a standard first-approach in predicting threedimensional protein structures. Here we have applied linear regression methods that include various sequence homology values for each residue as well as query residue qualitative predictors, corresponding to each of the twenty canonical amino acids. We fit the 268-protein learning set with a variety of sequence homology terms, including 20 and 6-term sequence entropy, and residue qualitative predictors. Then estimated RSA values are subsequently generated for the 215-protein Manesh test set. The qualitative predictors describe the actual query residue type (e.g. Gly) as opposed to the measures of sequence homology for the aligned subject sequences. This is consistent with our framework of modeling a limited set of discrete and/or physically intuitive predictors. Initial calculations involving normalized RSA values were considered as a likely first attempt, incorporating the notion of fitting an explicit binary characterization of individual residues, either as buried or accessible. Interestingly, the utilization of qualitative predictors showed significant prediction accuracy. Subsequent calculations using the original RSA values gave estimated values that, upon binary classification, indicated accuracies comparable to other first stage methods. Development of a second stage methodology is of current interest. Keywords-hydrophobicity, sequence entropy, buried residues, surface accessibilities, qualitative predictors

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call