Abstract

Over the last decade new species of Protozoa have been sequenced and deposited in GenBank. Analyzing large amounts of genomic data, especially using Next Generation Sequencing (NGS), is not a trivial task, considering that researchers used to deal or focus their studies on few genes or gene families or even small genomes. To facilitate the information extraction process from genomic data, we developed a database system called ProtozoaDB that included five genomes of Protozoa in its first version. In the present study, we present a new version of ProtozoaDB called ProtozoaDB 2.0, now with the genomes of 22 pathogenic Protozoa. The system has been fully remodeled to allow for new tools and a more expanded view of data, and now includes a number of analyses such as: (i) similarities with other databases (model organisms, the Conserved Domains Database, and the Protein Data Bank); (ii) visualization of KEGG metabolic pathways; (iii) the protein structure from PDB; (iv) homology inferences; (v) the search for related publications in PubMed; (vi) superfamily classification; and (vii) phenotype inferences based on comparisons with model organisms. ProtozoaDB 2.0 supports RESTful Web Services to make data access easier. Those services were written in Ruby language using Ruby on Rails (RoR). This new version also allows a more detailed analysis of the object of study, as well as expanding the number of genomes and proteomes available to the scientific community. In our case study, a group of prenyltransferase proteinsalready described in the literature was found to be a good drug target for Trypanosomatids.

Highlights

  • IntroductionOver the last decade new species of Protozoa were sequenced and deposited in GenBank [1,2,3,4].The availability of the primary genome sequence is a good starting point for the community to contribute further analyses (e.g., identification and functional annotation of coding sequences as well as comparative genomics analysis) in order to infer new information on the biology of these organisms

  • Over the last decade new species of Protozoa were sequenced and deposited in GenBank [1,2,3,4].The availability of the primary genome sequence is a good starting point for the community to contribute further analyses in order to infer new information on the biology of these organisms

  • The new version contains: (i) 193,559 genes; (ii) 218,100 proteins; (iii) 26,101 homologous groups (21,119 orthologous groups and 4982 paralogous groups) obtained by OrthoMCL analysis (Figure 2); and (iv) 195 phenotypes inferred by crossing information with the Saccharomyces Database

Read more

Summary

Introduction

Over the last decade new species of Protozoa were sequenced and deposited in GenBank [1,2,3,4].The availability of the primary genome sequence is a good starting point for the community to contribute further analyses (e.g., identification and functional annotation of coding sequences as well as comparative genomics analysis) in order to infer new information on the biology of these organisms. Over the last decade new species of Protozoa were sequenced and deposited in GenBank [1,2,3,4]. Generation Sequencing (NGS), is not a trivial task. The ongoing NGS technology makes the sequencing of more and more eukaryote genomes a reality, giving rise to new paradigms (either for the development and improvement of semi-automatic analysis/annotation systems for this huge amount of data, or for an object-view concept where raw reads are the main, fixed object, and assemblies with their annotations take a role of dynamically changing and modifying views of the object [5]). The processes involved in the sequencing and preparation of genomic information can be represented in a similar way as the life cycle of software (Figure 1). The first step is data acquisition that can be performed by: (i) downloading from public databases; and (ii) sequencing across multiple

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call