Abstract
BackgroundOf the 5 484 predicted proteins of Plasmodium falciparum, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes.ResultsWe present PlasmoDraft , a database of Gene Ontology (GO) annotation predictions for P. falciparum genes based on postgenomic data. Predictions of PlasmoDraft are achieved with a Guilt By Association method named Gonna. This involves (1) a predictor that proposes GO annotations for a gene based on the similarity of its profile (measured with transcriptome, proteome or interactome data) with genes already annotated by GeneDB; (2) a procedure that estimates the confidence of the predictions achieved with each data source; (3) a procedure that combines all data sources to provide a global summary and confidence estimate of the predictions. Gonna has been applied to all P. falciparum genes using most publicly available transcriptome, proteome and interactome data sources. Gonna provides predictions for numerous genes without any annotations. For example, 2 434 genes without any annotations in the Biological Process ontology are associated with specific GO terms (e.g. Rosetting, Antigenic variation), and among these, 841 have confidence values above 50%. In the Cellular Component and Molecular Function ontologies, 1 905 and 1 540 uncharacterized genes are associated with specific GO terms, respectively (740 and 329 with confidence value above 50%).ConclusionAll predictions along with their confidence values have been compiled in PlasmoDraft, which thus provides an extensive database of GO annotation predictions that can be achieved with these data sources. The database can be accessed in different ways. A global view allows for a quick inspection of the GO terms that are predicted with high confidence, depending on the various data sources. A gene view and a GO term view allow for the search of potential GO terms attached to a given gene, and genes that potentially belong to a given GO term.
Highlights
Of the 5 484 predicted proteins of Plasmodium falciparum, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments
We present PlasmoDraft http:// atgc.lirmm.fr/PlasmoDraft/, a database of Gene Ontology (GO) annotation predictions for P. falciparum achieved by applying a Guilt by Association (GBA) predictor named Gonna on several transcriptome, proteome and proteinprotein interaction data
We looked on the 986 genes without biological processes (BP) annotations that have been predicted with high Global Degree of Belief (GDB) on specific BP terms
Summary
Of the 5 484 predicted proteins of Plasmodium falciparum, the main causative agent of malaria, about 60% do not have sufficient sequence similarity with proteins in other organisms to warrant provision of functional assignments. Of the 5 484 coding genes of P. falciparum (http://plasmodb.org version 5.4), about 60% do not have sufficient similarity to proteins in other organisms to warrant provision of functional assignments. Almost two-thirds of the proteins appear to be specific to P. falciparum, a much higher proportion than observed in other enkaryotes [2]. This is likely exacerbated by the high evolutionary distance between P. falciparum and other sequenced eukaryotes, so homology detection is a hard task. Non-homology methods are needed to obtain functional clues for these uncharacterized genes [7]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have