Abstract

As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR).

Highlights

  • The advice here presented is based on a need seen while working in the ELIXIR-EXCELERATE task “Capacity Building in Genome Assembly and Annotation”

  • In the event that a genome draft has a significant added value to address the problem, one should consider whether sufficient financial and computational resources are available to produce a genome of satisfactory quality. For those that have decided to embark upon a genome assembly and/or annotation project, we provide, here, a set of good practices intended to facilitate the project completion

  • Elsewhere in the paper they are given as "up to 10% or 15%." Scaffolding and gap filling The very end of this section states: Be aware that any changes to a genome assembly will most likely necessitate annotation to be re-started from scratch, and you should be sure to “freeze” the assembly completely before starting annotation

Read more

Summary

Introduction

0.1 Genomics in 2018 cannot possibly be done without a pan-genome perspective. This has been true for years in the area of microbiology, where projects rarely assemble and annotate a single genome, but rather a few dozens related strains. This is becoming state-of-theart for genomic projects of crops and model plants, such as Brachypodium distachyon, as well as in human medicine. A couple of sentences should be added explaining that in this context a group of genomes are sequences, assembled and annotated in parallel, which makes it more challenging and facilitates spotting and correcting errors. 0.2 I would add to the checklist a literature survey to identify related genomes

30. Andrews S
35. Bushnell B
52. Lisch D
64. NCBI Resource Coordinators
The text says
Investigate the properties of the genomes you study
Extract high quality DNA
Findings
Assemble your genome
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call