A resource for improved predictions of Trypanosoma and Leishmania protein three-dimensional structure.

Richard John Wheeler

doi:10.1371/journal.pone.0259871

Richard John Wheeler

Open Access

https://doi.org/10.1371/journal.pone.0259871

Copy DOI

Abstract

AlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold or RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. Problematic species include Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life (Discoba) in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for Discoba species, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.

Highlights

Machine learning approaches to protein structure prediction have crossed a critical success threshold
multiple sequence alignment (MSA) are the input for AlphaFold2 [1] and RoseTTAfold [2], with AlphaFold2 reaching the highest accuracy prediction at the most recent Critical Assessment of protein Structure Prediction (CASP) competition (CASP14 [3])–an accuracy comparable to experimental protein structure determination
Lower predicted local distance difference test score (pLDDT) could be explained by more disordered protein domains, as predictions for these regions correlate with low pLDDT [1]

Summary

Introduction

Machine learning approaches to protein structure prediction have crossed a critical success threshold. While predicting the three-dimensional structure of a protein from sequence alone is still unsolved problem, a multiple sequence alignment (MSA) of the target protein sequence with related proteins provides key additional information. Cutting edge approaches using such MSAs have the potential to reach very high accuracy. MSAs are the input for AlphaFold2 [1] and RoseTTAfold [2], with AlphaFold reaching the highest accuracy prediction at the most recent Critical Assessment of protein Structure Prediction (CASP) competition (CASP14 [3])–an accuracy comparable to experimental protein structure determination. AlphaFold2predicted structures for the near-whole proteome of 21 species [4] has been made publicly available, and tools like ColabFold [5] and the official AlphaFold Colaboratory notebook [6] make custom predictions accessible.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Nov 11, 2021
Citations: 34	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A resource for improved predictions of Trypanosoma and Leishmania protein three-dimensional structure.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

A Proteogenomic Survey of the Medicago truncatula Genome
Jeremy D Volkening ... Michael R Sussman
Molecular & Cellular Proteomics | VOL. 11
Jeremy D Volkening, et. al.Jeremy D Volkening ... Michael R Sussman
01 Oct 2012
Molecular & Cellular Proteomics | VOL. 11

Protein Sequence and Structure Databases: A Review
Marcos Arauzo-Bravo ... Shandar Ahmad
Current Analytical Chemistry | VOL. 1
Marcos Arauzo-Bravo, et. al.Marcos Arauzo-Bravo ... Shandar Ahmad
01 Nov 2005
Current Analytical Chemistry | VOL. 1

Using the FASTA program to search protein and DNA sequence databases.
William R. Pearson
Methods in molecular biology (Clifton, N.J.) | VOL. 24
William R. PearsonWilliam R. Pearson
01 Jan 1993
Methods in molecular biology (Clifton, N.J.) | VOL. 24

Structure Prediction for Alternatively Spliced Proteins
Lukasz Kozlowski ... Janusz M. Bujnicki
-
Lukasz Kozlowski, et. al.Lukasz Kozlowski ... Janusz M. Bujnicki
11 Jan 2012
11 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A resource for improved predictions of Trypanosoma and Leishmania protein three-dimensional structure.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one