Abstract

AlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold or RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. Problematic species include Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life (Discoba) in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for Discoba species, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.

Highlights

  • Machine learning approaches to protein structure prediction have crossed a critical success threshold

  • multiple sequence alignment (MSA) are the input for AlphaFold2 [1] and RoseTTAfold [2], with AlphaFold2 reaching the highest accuracy prediction at the most recent Critical Assessment of protein Structure Prediction (CASP) competition (CASP14 [3])–an accuracy comparable to experimental protein structure determination

  • Lower predicted local distance difference test score (pLDDT) could be explained by more disordered protein domains, as predictions for these regions correlate with low pLDDT [1]

Read more

Summary

Introduction

Machine learning approaches to protein structure prediction have crossed a critical success threshold. While predicting the three-dimensional structure of a protein from sequence alone is still unsolved problem, a multiple sequence alignment (MSA) of the target protein sequence with related proteins provides key additional information. Cutting edge approaches using such MSAs have the potential to reach very high accuracy. MSAs are the input for AlphaFold2 [1] and RoseTTAfold [2], with AlphaFold reaching the highest accuracy prediction at the most recent Critical Assessment of protein Structure Prediction (CASP) competition (CASP14 [3])–an accuracy comparable to experimental protein structure determination. AlphaFold2predicted structures for the near-whole proteome of 21 species [4] has been made publicly available, and tools like ColabFold [5] and the official AlphaFold Colaboratory notebook [6] make custom predictions accessible.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.