Abstract

Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.

Highlights

  • Genomic applications in aquatic species that could be potentially important for aquaculture are slower compared with human, livestock, and crops [19,20,21], compounded by larger diversity, lack of reference genomes, and more novice aquaculture industries

  • While we have provided a brief summary of commonly used tools (Table 2), the comprehensive program list focused on third-generation sequencing (TGS) reads can be accessed at LRS-DB

  • (3) Annotation: National Center for Biotechnology Information (NCBI) or European Bioinformatics Institute (EBI) ! If not, proceed a semiautomatic pipeline starting from structural annotation ! RepeatMasker ! Ab initio Augustus training with MAKER ! Evidence-based prediction (RNA-seq) with MAKER ! Noncoding RNA prediction with NONCODE ! Functional annotation with Blast2GO ! Genome Browser

Read more

Summary

Introduction

The revolution in new sequencing technologies and computational developments has allowed researchers to drive advances in genome assembly and annotation to make the process better, faster, and cheaper with key model organisms [1,2] Such technical advantages and established recommendations and strategies have been widely applied in humans [3,4,5,6], terrestrial animals [7,8,9,10,11,12], and plants and crops [13,14,15,16,17,18]. The generation of genetic linkage maps has been successfully applied to recognize key components in the sustainable production of aquaculture species [41,42] These attempts have resulted in the emphasis of genomic evaluations/ selections or advanced selective breeding programs for desirable traits, such as growth, sex determination, sex markers, and disease resistance [42].

20 Chromosome rearrangement and spawning time
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call