Abstract

AbstractManual annotation‭ (‬the‭ "‬museum‭" ‬model of annotation‭) ‬relies on a small group of specialized curators to catalogue and classify genes according to their functional roles.‭ This‬ is both costly and time consuming and therefore is used only for model organisms with sufficient funding.‭ ‬Smaller research communities often have to rely on other models of annotation,‭ ‬mainly automated annotation‭ (‬the‭ "‬factory‭" ‬model,‭ ‬e.g.‭ ‬Ensembl‭)‬,‭ ‬and the‭ "‬jamboree‭" ‬model‭ (‬in which a group of leading biologists from the community and bioinformaticians come together for a short intensive annotation workshop‭)‬.‭ ‬At the Wellcome Trust Sanger Institute‭ (‬WTSI‭)‬,‭ ‬the Havana team provides high quality manual annotation of finished vertebrate genome sequences,‭ ‬namely human,‭ ‬mouse and zebrafish.‭ ‬We also perform the curation of specific finished regions such as the MHC in dog,‭ ‬cow and pig,‭ ‬whose whole genomes have been‭ ‬assembled from unfinished BACs or from whole genome shotgun sequences.‭ ‬In addition,‭ ‬we at Havana have also hosted annotation jamborees for the cow‭ (‬Bos taurus‭) ‬and pig‭ (‬Sus scrofa‭) ‬genomes.‭ ‬During those sessions,‭ ‬the research community had the opportunity to annotate their genes of interest under expert guidance using the custom written publicly available Otterlace annotation system,‭ ‬and the unified manual annotation guidelines.‭ ‬By making use of the tools and skills acquired during the cow and pig jamborees,‭ ‬the delegates can continue annotating their genomes remotely.‭ ‬For the pig genome,‭ ‬a highly contiguous physical map has been generated by an international effort of four laboratories (available in Pre!Ensembl) and‭ ‬is being used as a substrate for the swine genome sequencing project.‭ ‬Upcoming vertebrate genomes will be sequenced to a high depth coverage with the next generation sequencing technologies‭ (‬e.g.‭ ‬Illumina,‭ ‬454,‭ ‬SOLiD‭) ‬but will have the drawback of not being manually finished.‭ ‬Manual annotation will be more accurate than the automated predictions at coping with any assembly problems derived from these high coverage but unfinished‭ (‬or automatic pre-finished‭) ‬genomes.‭ ‬Once these inherent assembly errors are corrected and the gene structures are accurately identified with manual annotation,‭ ‬the curated genes will be incorporated and merged with the predicted gene models in Ensembl to provide a unified view of the landscape of vertebrate genomes.‭ ‬I will present an introduction to our manual annotation system and our experience using it for annotation jamborees at the WTSI.

Highlights

  • II) Analysis and annotation pipelineThe catalogue and classification of genes manually performed by specialised curators produce high quality annotation, but are costly, time consuming and used for just a handful of model genomes

  • Manual annotation will cope better than the automated systems with the assembly problems inherent to unfinished sequences and will be more accurate at identifying correct gene structures. Once this step is achieved, the curated genes will be incorporated and merged with the predicted genes in Ensembl to provide a consistent view of the landscape of vertebrate genomes

  • The fragments 1, 2 and 3 have been curated as part novel gene in pig which is homologous to the human SLC2A13

Read more

Summary

I) Introduction

The catalogue and classification of genes manually performed by specialised curators produce high quality annotation, but are costly, time consuming and used for just a handful of model genomes. With the ­ generation sequencing technologies becoming more accessible, new genomes can be sequenced to a high depth coverage but will not be manually finished. For these genomes, manual annotation will cope better than the automated systems with the assembly problems inherent to unfinished sequences and will be more accurate at identifying correct gene structures. Manual annotation will cope better than the automated systems with the assembly problems inherent to unfinished sequences and will be more accurate at identifying correct gene structures Once this step is achieved, the curated genes will be incorporated and merged with the predicted genes in Ensembl to provide a consistent view of the landscape of vertebrate genomes

MySQLDB analysis data annotation
VI) Concluding remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call