Abstract

In this study, we propose a new peptide characterization method that gives attention to both the amino acid composition and the residue local environment. Using this approach, structural characteristics of peptides derived from Escherichia coli proteome were parameterized and, based upon that, the performance profile of eight statistical modelling methods were validated rigorously and compared comprehensively by applying them to modelling relationship between the sequence structure and retention ability for 816 experimentally measured peptides and to predicting normalized retention times for 121,273 unmeasured peptides in liquid chromatography. Results show that the regression models constructed by nonlinear approaches are more robust and predictable but time-consuming than those by linear ones. In these modelling methods, Gaussian process and back-propagation neural network possess the best stability, unbiased ability and predictive power, thus they can be used to accurately model the peptide structure–retention relationships; multiple linear regression and partial least squares regression perform worse compared to nonlinear modelling techniques but they are computationally efficient, so they are promising candidates for solving the qualitative problems involved in massive data. In addition, by investigating the descriptor importance in different models we found that the amino acid composition presents a significantly linear correlation with the retention time of peptides, whereas the residue environment is mainly correlated nonlinearly with peptide retention. The polar Arg and strongly hydrophobic amino acids such as Leu, Ile, Phe, Trp and Val are the critical factors influencing peptide retention behavior.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call