Abstract

BackgroundProteochemometrics is a new methodology that allows prediction of protein function directly from real interaction measurement data without the need of 3D structure information. Several reported proteochemometric models of ligand-receptor interactions have already yielded significant insights into various forms of bio-molecular interactions. The proteochemometric models are multivariate regression models that predict binding affinity for a particular combination of features of the ligand and protein. Although proteochemometric models have already offered interesting results in various studies, no detailed statistical evaluation of their average predictive power has been performed. In particular, variable subset selection performed to date has always relied on using all available examples, a situation also encountered in microarray gene expression data analysis.ResultsA methodology for an unbiased evaluation of the predictive power of proteochemometric models was implemented and results from applying it to two of the largest proteochemometric data sets yet reported are presented. A double cross-validation loop procedure is used to estimate the expected performance of a given design method. The unbiased performance estimates (P2) obtained for the data sets that we consider confirm that properly designed single proteochemometric models have useful predictive power, but that a standard design based on cross validation may yield models with quite limited performance. The results also show that different commercial software packages employed for the design of proteochemometric models may yield very different and therefore misleading performance estimates. In addition, the differences in the models obtained in the double CV loop indicate that detailed chemical interpretation of a single proteochemometric model is uncertain when data sets are small.ConclusionThe double CV loop employed offer unbiased performance estimates about a given proteochemometric modelling procedure, making it possible to identify cases where the proteochemometric design does not result in useful predictive models. Chemical interpretations of single proteochemometric models are uncertain and should instead be based on all the models selected in the double CV loop employed here.

Highlights

  • Proteochemometrics is a new methodology that allows prediction of protein function directly from real interaction measurement data without the need of 3D structure information

  • Software Computer programs were written in MATLAB (Mathworks Inc., USA) to integrate the double loop procedure in Figure 1 with robust multivariate linear regression using partial least squares (PLS) regression and ridge regression

  • The results presented in this work indicate that it is possible to design single proteochemometric models with predictive power based

Read more

Summary

Introduction

Proteochemometrics is a new methodology that allows prediction of protein function directly from real interaction measurement data without the need of 3D structure information. Current computational methods for prediction of protein function rely to a large extent on predictions based on the amino acid sequence similarity with proteins having known functions. Other computational methods for prediction of protein function include structural similarity comparisons and molecular dynamics simulations (e.g. molecular docking). These latter methods are powerful and may in general offer important 3D mechanistic explanations of interaction and function, they require access to protein 3D structure. Computational determination of a 3D structure is well known to be resource demanding, error prone, and generally requires prior knowledge, such as the 3D structure of a homologous protein This bottleneck makes it important to develop new methods for prediction of protein function when a 3D model is not available

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.