Abstract

BackgroundGossypium australe F. Mueller (2n = 2x = 26, G2 genome) possesses valuable characteristics. For example, the delayed gland morphogenesis trait causes cottonseed protein and oil to be edible while retaining resistance to biotic stress. However, the lack of gene sequences and their alternative splicing (AS) in G. australe remain unclear, hindering to explore species-specific biological morphogenesis.ResultsHere, we report the first sequencing of the full-length transcriptome of the Australian wild cotton species, G. australe, using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) from the pooled cDNA of ten tissues to identify transcript loci and splice isoforms. We reconstructed the G. australe full-length transcriptome and identified 25,246 genes, 86 pre-miRNAs and 1468 lncRNAs. Most genes (12,832, 50.83%) exhibited two or more isoforms, suggesting a high degree of transcriptome complexity in G. australe. A total of 31,448 AS events in five major types were found among the 9944 gene loci. Among these five major types, intron retention was the most frequent, accounting for 68.85% of AS events. 29,718 polyadenylation sites were detected from 14,536 genes, 7900 of which have alternative polyadenylation sites (APA). In addition, based on our AS events annotations, RNA-Seq short reads from germinating seeds showed that differential expression of these events occurred during seed germination. Ten AS events that were randomly selected were further confirmed by RT-PCR amplification in leaf and germinating seeds.ConclusionsThe reconstructed gene sequences and their AS in G. australe would provide information for exploring beneficial characteristics in G. australe.

Highlights

  • The analyses indicated the presence of 61,805 nonredundant isoforms (87.29% of the total) that contained high-quality Open reading frame (ORF) with lengths ranging from 300 bp to 9345 bp with an average of 1171 bp, and these ORFs pertained to 25,246 transcript loci (85.30% of the total)

  • A total of 1,001,928 consensus sequences (CCS) collapsed into 92,728 non-redundant isoforms, these isoforms derived from 31,904 transcript loci

  • Rarefaction analysis using subset of the FL reads revealed that sequencing depth was almost saturated for transcript loci discovery. 25,246 transcript loci (85.30%) contained high-quality ORFs (≥100 aa), accounting for about 63% of the total predicted genes in the diploid cotton

Read more

Summary

Introduction

A wild diploid cotton species (2n = 2x = 26, G2 genome), grows in a limited area of central and northern Australia This species has many desirable characteristics, such as resistance to pests (aphids and mites) and diseases (Fusarium and Verticillium wilts) and tolerance to abiotic stresses (such as salinity, heat and drought). It contains immature lysigenous glands but no terpenoid aldehydes, and its pigment glands appear only after seed germination; little gossypol is deposited in the dormant seeds of the species [4, 5]. This distinguishing characteristic, called delayed gland morphogenesis, has the potential to enable the

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call