Abstract

BackgroundFunctional genomic analyses rely on high-quality genome assemblies and annotations. Highly contiguous genome assemblies have become available for a variety of species, but accurate and complete annotation of gene models, inclusive of alternative splice isoforms and transcription start and termination sites, remains difficult with traditional approaches.ResultsHere, we utilized full-length isoform sequencing (Iso-Seq), a long-read RNA sequencing technology, to obtain a comprehensive annotation of the transcriptome of the ant Harpegnathos saltator. The improved genome annotations include additional splice isoforms and extended 3′ untranslated regions for more than 4000 genes. Reanalysis of RNA-seq experiments using these annotations revealed several genes with caste-specific differential expression and tissue- or caste-specific splicing patterns that were missed in previous analyses. The extended 3′ untranslated regions afforded great improvements in the analysis of existing single-cell RNA-seq data, resulting in the recovery of the transcriptomes of 18% more cells. The deeper single-cell transcriptomes obtained with these new annotations allowed us to identify additional markers for several cell types in the ant brain, as well as genes differentially expressed across castes in specific cell types.ConclusionsOur results demonstrate that Iso-Seq is an efficient and effective approach to improve genome annotations and maximize the amount of information that can be obtained from existing and future genomic datasets in Harpegnathos and other organisms.

Highlights

  • IntroductionContiguous genome assemblies have become available for a variety of species, but accurate and complete annotation of gene models, inclusive of alternative splice isoforms and transcription start and termination sites, remains difficult with traditional approaches

  • Functional genomic analyses rely on high-quality genome assemblies and annotations

  • Using isoform sequencing (Iso-Seq) to update Harpegnathos gene annotation We previously generated a single-cell RNA-seq atlas of the Harpegnathos brain during the worker–gamergate transition and discovered extensive changes in cell type composition in glia and neurons [22]. While inspecting these sequencing data [23], we noticed that in many cases, even when using the latest NCBI annotation (NCBI Release 102; hereafter referred to as HSAL50), the single-cell RNA-seq reads mapped outside gene model boundaries, typically donwstream of the annotated termination sites (TTSs), resulting in decreased coverage and information loss

Read more

Summary

Introduction

Contiguous genome assemblies have become available for a variety of species, but accurate and complete annotation of gene models, inclusive of alternative splice isoforms and transcription start and termination sites, remains difficult with traditional approaches. Improved sequencing technologies have enabled studies in previously inaccessible organisms, but annotations remain the bottleneck to thorough genomic and Shields et al BMC Biology (2021) 19:254 identify the 5′ and 3′ untranslated regions (UTRs), resulting in inaccurate transcription start sites (TSSs) and transcription termination sites (TTSs) [4]. While longread DNA sequencing technology was utilized to great effect to improve the reference Harpegnathos genome assembly, existing gene annotations still suffered from the shortcomings listed above, imposed by their reliance on traditional, short-read RNA-seq coupled with gene prediction software

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call