Abstract

Connecting the dots among the amino acid sequence of a protein, its structure, and its function remains a central theme in molecular biology, as it would have many applications in the treatment of illnesses related to misfolding or protein instability. As a result of high-throughput sequencing methods, biologists currently live in a protein sequence-rich world. However, our knowledge of protein structure based on experimental data remains comparatively limited. As a consequence, protein structure prediction has established itself as a very active field of research to fill in this gap. This field, once thought to be reserved for theoretical biophysicists, is constantly reinventing itself, borrowing ideas informed by an ever-increasing assembly of scientific domains, from biology, chemistry, (statistical) physics, mathematics, computer science, statistics, bioinformatics, and more recently data sciences. We review the recent progress arising from this integration of knowledge, from the development of specific computer architecture to allow for longer timescales in physics-based simulations of protein folding to the recent advances in predicting contacts in proteins based on detection of coevolution using very large data sets of aligned protein sequences.

Highlights

  • Proteins are essential macromolecular biomolecules to all organisms, as they participate in most processes within cells, thereby sustaining life

  • The amino acid sequence of a protein is defined by the nucleotide sequence of its gene

  • We described attempts to derive a physics-based solution to the structure prediction problem; our tone was meant to be positive and optimistic, but we cannot deny that those attempts have their limitations

Read more

Summary

Introduction

Proteins are essential macromolecular biomolecules to all organisms, as they participate in most processes within cells, thereby sustaining life. Dill et al recently illustrated that the addition of some semireliable external information to the potential that steers the molecular simulation enables accurate prediction of the native conformations for small protein structures within the context of CASP52,71 This information is provided in the form of binary residue contacts deduced from the protein sequence itself. The starting idea was to derive statistical preferences for amino acids to be within a specific secondary structure based on known protein structures and use those preferences for predictions This inference, called the inverse problem, proved harder than expected. In biology, it will have an impact on many diseases related to protein misfolding, from neurodegenerative diseases to diabetes through the prediction of stability of mutants (EVmutation website: http://marks.hms. harvard.edu) and perhaps even to personalized medicine

Conclusions
18. Karplus M
22. Fiser A
74. Rost B
Findings
80. Jones DT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.