EPIC-DB: a proteomics database for studying Apicomplexan organisms

Carlos J Madrid-Aliste,Louis M Weiss,Joseph M Dybas,Ruth Hogue Angeletti,Andras Fiser,Kami Kim,Istvan Simon

doi:10.1186/1471-2164-10-38

Carlos J Madrid-Aliste, Louis M Weiss + Show 5 more

Open Access

https://doi.org/10.1186/1471-2164-10-38

Copy DOI

Abstract

BackgroundHigh throughput proteomics experiments are useful for analyzing the protein expression of an organism, identifying the correct gene structure of a genome, or locating possible post-translational modifications within proteins. High throughput methods necessitate publicly accessible and easily queried databases for efficiently and logically storing, displaying, and analyzing the large volume of data.DescriptionEPICDB is a publicly accessible, queryable, relational database that organizes and displays experimental, high throughput proteomics data for Toxoplasma gondii and Cryptosporidium parvum. Along with detailed information on mass spectrometry experiments, the database also provides antibody experimental results and analysis of functional annotations, comparative genomics, and aligned expressed sequence tag (EST) and genomic open reading frame (ORF) sequences. The database contains all available alternative gene datasets for each organism, which comprises a complete theoretical proteome for the respective organism, and all data is referenced to these sequences. The database is structured around clusters of protein sequences, which allows for the evaluation of redundancy, protein prediction discrepancies, and possible splice variants. The database can be expanded to include genomes of other organisms for which proteome-wide experimental data are available.ConclusionEPICDB is a comprehensive database of genome-wide T. gondii and C. parvum proteomics data and incorporates many features that allow for the analysis of the entire proteomes and/or annotation of specific protein sequences. EPICDB is complementary to other -genomics- databases of these organisms by offering complete mass spectrometry analysis on a comprehensive set of all available protein sequences.

Highlights

High throughput proteomics experiments are useful for analyzing the protein expression of an organism, identifying the correct gene structure of a genome, or locating possible post-translational modifications within proteins
Database content EPICDB is a collection of experimental and computational data that are referenced to all available protein sequences for T. gondii and C. parvum, which represent the theoretical proteomes of the respective organisms
To the right of each protein sequence, a check-mark indicates the presence of information on mass spectrometry experiments, antibody production experiments, expressed sequence tag (EST) alignments, functional annotations (PFAM domains, transmembrane segments, and signal peptides), or comparative genomics for that sequence

Summary

Conclusion

High throughput mass spectrometry data has a variety of important applications to genomics and proteomics. Vast amount of genomic sequence data with comparatively small amounts of experimentally derived protein sequence data necessitate the need for computational gene finders and high throughput mass spectrometry data http://www.biomedcentral.com/1471-2164/10/38 can inform and support these predictions. High throughput mass spectrometry is an emerging efficient method toward a comprehensive, proteome-wide analysis of an organism, which is an important step in identifying protein interactions, studying protein expression levels, elucidating alternative splice sites, or predicting potential chemotherapeutic targets. JMD performed programming tasks and wrote the article. KK and IS consulted on the project and curated the database.

Background

Utility and discussion

Findings

Discussion