Abstract
BackgroundAnther development has been extensively studied at the transcriptional level, but a systematic analysis of full-length transcripts on a genome-wide scale has not yet been published. Here, the Pacific Biosciences (PacBio) Sequel platform and next-generation sequencing (NGS) technology were combined to generate full-length sequences and completed structures of transcripts in anthers of Chinese cabbage.ResultsUsing single-molecule real-time sequencing (SMRT), a total of 1,098,119 circular consensus sequences (CCSs) were generated with a mean length of 2664 bp. More than 75% of the CCSs were considered full-length non-chimeric (FLNC) reads. After error correction, 725,731 high-quality FLNC reads were estimated to carry 51,501 isoforms from 19,503 loci, consisting of 38,992 novel isoforms from known genes and 3691 novel isoforms from novel genes. Of the novel isoforms, we identified 407 long non-coding RNAs (lncRNAs) and 37,549 open reading frames (ORFs). Furthermore, a total of 453,270 alternative splicing (AS) events were identified and the majority of AS models in anther were determined to be approximate exon skipping (XSKIP) events. Of the key genes regulated during anther development, AS events were mainly identified in the genes SERK1, CALS5, NEF1, and CESA1/3. Additionally, we identified 104 fusion transcripts and 5806 genes that had alternative polyadenylation (APA).ConclusionsOur work demonstrated the transcriptome diversity and complexity of anther development in Chinese cabbage. The findings provide a basis for further genome annotation and transcriptome research in Chinese cabbage.
Highlights
Anther development has been extensively studied at the transcriptional level, but a systematic analysis of full-length transcripts on a genome-wide scale has not yet been published
Transcriptome sequencing and error correction Limited by the capacity of short-read RNA-Seq on an Illumina platform, anther-specific transcriptome analysis of Chinese cabbage double haploid (DH) line ‘FT’ (Fig. 1 a-c) was carried out using the Pacific Biosciences (PacBio) Sequel platform
To identify the transcripts as completely as possible, highquality total mRNAs from each of the pooled samples obtained throughout anther development were extracted and mixed to obtain full-length sequences and splice variants
Summary
Anther development has been extensively studied at the transcriptional level, but a systematic analysis of full-length transcripts on a genome-wide scale has not yet been published. The advent of NGS technologies, such as ABI SOLiD, Illumina Solexa, and Roche 454 systems, stimulated structural and functional genomics studies for diverse plant species Among these technologies, Illumina sequencing has the advantages of high accuracy, high throughput, high sensitivity, and low cost, and is the most widely used platform in genome sequencing [2]. Limited by NGS methods, short RNA-Seq reads must be assembled into longer DNA contigs [15], a process that is susceptible to misassembly of short sequence reads transcribed from highly repetitive regions or similar members of multiple gene families [16] This problem may become even more severe for polyploid plants that often harbor higher sequence similarity between coexisting subgenomes, which frequently indirectly leads to annotation error. Approximately 83.4% of multipleexon genes are subject to AS in A. thaliana, which contributes to organismal protein diversity without massively increasing the number of genes [17]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have