Challenges and advances for transcriptome assembly in non-model species.

Arnaud Ungaro,Rémi Chappaz,Jean-Philippe Mévy,Jean-François Martin,André Gilles,Nicolas Pech,R J Scott Mccairns

doi:10.1371/journal.pone.0185020

Abstract

Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence (0% to 30% divergence). Here we focus on gene identification and not on resolving the whole set of transcripts (i.e. the complete transcriptome). For simulated datasets, the transcriptome-guided assembly based on blastn recovers 94.8% of genes irrespective of read length at 0% divergence; however, assignment rate of reads is negatively correlated with both increasing divergence level and reducing read lengths. Nevertheless, we still observe 92.6% of recovered genes at 30% divergence irrespective of read length. This analysis also produces a categorization of genes relative to their assignment, and suggests guidelines for data processing prior to analyses of comparative transcriptomics and gene expression to minimize potential inferential bias associated with incorrect transcript assignment. We also compare the performances of de novo assembly alone vs in combination with a transcriptome-guided assembly based on blastn both via simulation and empirically, using data from a cyprinid fish species and from an oak species. For any simulated scenario, the transcriptome-guided assembly using blastn outperforms the de novo approach alone, including when the divergence level is beyond the reach of traditional mapping methods. Combining de novo assembly and a related reference transcriptome for read assignment also addresses the bias/error in contigs caused by the dependence on a related reference alone. Empirical data corroborate these findings when assembling transcriptomes from the two non-model organisms: Parachondrostoma toxostoma (fish) and Quercus pubescens (plant). For the fish species, out of the 31,944 genes known from D. rerio, the guided and de novo assemblies recover respectively 20,605 and 20,032 genes but the performance of the guided assembly approach is much higher for both the contiguity and completeness metrics. For the oak, out of the 29,971 genes known from Vitis vinifera, the transcriptome-guided and de novo assemblies display similar performance, but the new guided approach detects 16,326 genes where the de novo assembly only detects 9,385 genes.

Highlights

Synthesis and maturation of RNAs is an elemental cog in the cellular machinery
Assigning reads to a reference transcriptome was done by finding regions of similarity between the reads and the reference transcriptome through blastn
This was computed on simulated data based on Danio rerio with a range of sequence lengths (100 to 350bp) and simulated divergence (0% to 30% with regard to the original D. rerio sequences) for a 10X uniform coverage

Summary

Introduction

Synthesis and maturation of RNAs is an elemental cog in the cellular machinery. inherently noisy, transcriptional variation can be associated with basal/fundamental processes such as enzyme activity [1] and protein production [2]. The quantification of RNA abundance remains an essential link in deciphering the genotype-phenotype map In this context, transcriptome inference (i.e. in silico assembly and annotation) is an initial and requisite basis for studying gene expression [6,7]. After two decades of RNA microarrays [8], RNA-seq has democratized the analysis of transcriptomes for any non-model organism This technological innovation has spread to several new uses in multiple domains in the life sciences, from direct applications such as transcript annotation [9,10], to providing insights into cis and trans regulation in allopolyploid species [11], speciation [11,12], heat stress [13], ecotoxicology [14] and ecology and evolution in general [15]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Sep 20, 2017
Citations: 37	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Challenges and advances for transcriptome assembly in non-model species.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Growth characteristics of one-year-old seedlings of three autochthonous oak species in suboptimal growing conditions
... Ivona Kerkez Janković
REFORESTA | VOL. -
, et. al. ... Ivona Kerkez Janković
28 Jun 2019
REFORESTA | VOL. -

Exploring the unknown: how can we improve single-cell RNAseq cell type annotations in non-model organisms?
Kevin H Wong ... Nikki Traylor-Knowles
Integrative and comparative biology | VOL. -
Kevin H Wong, et. al.Kevin H Wong ... Nikki Traylor-Knowles
16 Jul 2024
Integrative and comparative biology | VOL. -

Effective double-digest RAD sequencing and genotyping despite large genome size.
Roberta Gargiulo ... Michael F Fay
Molecular Ecology Resources | VOL. 21
Roberta Gargiulo, et. al.Roberta Gargiulo ... Michael F Fay
09 Jan 2021
Molecular Ecology Resources | VOL. 21

Gene expression profiling via LongSAGE in a non-model plant species: a case study in seeds of Brassica napus
Christian Obermeier ... Bashir Hosseini
BMC Genomics | VOL. 10
Christian Obermeier, et. al.Christian Obermeier ... Bashir Hosseini
01 Jan 2009
BMC Genomics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Challenges and advances for transcriptome assembly in non-model species.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE