Abstract

Currently available sequencing technologies enable quick and economical sequencing of many new eukaryotic parasite (apicomplexan or kinetoplastid) species or strains. Compared to SNP calling approaches, de novo assembly of these genomes enables researchers to additionally determine insertion, deletion and recombination events as well as to detect complex sequence diversity, such as that seen in variable multigene families. However, there currently are no automated eukaryotic annotation pipelines offering the required range of results to facilitate such analyses. A suitable pipeline needs to perform evidence-supported gene finding as well as functional annotation and pseudogene detection up to the generation of output ready to be submitted to a public database. Moreover, no current tool includes quick yet informative comparative analyses and a first pass visualization of both annotation and analysis results. To overcome those needs we have developed the Companion web server (http://companion.sanger.ac.uk) providing parasite genome annotation as a service using a reference-based approach. We demonstrate the use and performance of Companion by annotating two Leishmania and Plasmodium genomes as typical parasite cases and evaluate the results compared to manually annotated references.

Highlights

  • The availability, extent and quality of genomic annotations are of crucial importance for powerful genomics methods like comparative studies, expression analysis or even simple gene knockdown [1]

  • To address the demand for quick, automatically generated parasite genome annotations, we have developed Companion (COMprehensive Parasite ANnotatION) as a web server

  • For each of the evaluation species, the curated genome annotation of a related species was used as a reference

Read more

Summary

Introduction

The availability, extent and quality of genomic annotations are of crucial importance for powerful genomics methods like comparative studies, expression analysis or even simple gene knockdown [1]. Typical characteristics to examine are similarities and differences in gene content, phylogenetic relationships and synteny To further characterize these differences, functional information about the genes involved is required, encompassing protein product descriptions and controlled vocabulary terms, e.g. for function and localization [2,3]. Many tools exist to perform the basic task of ab initio gene finding [6,7,8,9], optimized to accurately predict the boundaries of all genes and their exons in the genome sequence Most of these tools use machine-learning approaches that require training with manually curated gene models and/or extrinsic evidence such as RNA-seq transcripts. Often underestimated yet nontrivial [11], is the generation of a suitable output format for submission to public databases

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call