A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing

Nam V Hoang,Patrick J Mason,Agnelo Furtado,Robert J Henry,Lakshmi Kasirajan,Prathima P Thirugnanasambandam,Annelie Marquardt,Frederik C Botha

doi:10.1186/s12864-017-3757-8

Abstract

BackgroundDespite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms.ResultsThe sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes.ConclusionsThe transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.

Highlights

Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available
The transcript data generated in this study probably accounts for about 71% of the total predicted genes in the sugarcane genome
The majority of transcript isoforms captured in PacBio isoform sequencing (Iso-Seq) were protein-coding sequences (93.5% containing Open reading frame (ORF) ≥100 aa), whereas only 54.2% of the total RNA-Seq de novo contigs contained ORFs

Summary

Introduction

Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane studies, including transcriptome studies found in the literature, i.e. in [10] and [11], have been based on sorghum genomic/ transcript sequences [12] which have the highest gene synteny and orthologous alignment with the sugarcane genome [13]; sugarcane expressed sequence tags (ESTs) [14]; Saccharum officinarum gene indices - SoGI v3.0 [15] representing ~90% of the estimated genes in S. officinarum [2, 3]; and other resources reviewed in [16, 17] Studies based on these databases have provided useful information on the sugarcane transcriptome, while a whole genome sequence is not yet available. There is a need to construct FL transcript sequences including such isoforms to facilitate analysis of isoform differential expression, and to extend our understanding of the sugarcane transcriptome

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: May 22, 2017
Citations: 157	License type: open-access

R Discovery Prime

R Discovery Prime

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Differentially expressed full-length, fusion and novel isoforms transcripts-based signature of well-differentiated keratinized oral squamous cell carcinoma.
Neetu Singh ...
Oncotarget | VOL. 11
Neetu Singh, et. al.Neetu Singh ...
25 Aug 2020
Oncotarget | VOL. 11

Analysis of genes controlling biomass traits in the genome of sugarcane (Saccharum spp. hybrids)

-

22 May 2017
22 May 2017

Reference long-read isoform-aware transcriptomes of 4 human peripheral blood lymphocyte subsets.
Cassandra R Woolley ... Sabine J Waigel
G3 (Bethesda, Md.) | VOL. 12
Cassandra R Woolley, et. al.Cassandra R Woolley ... Sabine J Waigel
26 Sep 2022
G3 (Bethesda, Md.) | VOL. 12

ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data.
Yuan Gao ... Robert Wang
Science advances | VOL. 9
Yuan Gao, et. al.Yuan Gao ... Robert Wang
20 Jan 2023
Science advances | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics