Abstract
RNA-sequencing has been widely used to obtain high throughput transcriptome sequences in various species, but the assembly of a full set of complete transcripts is still a significant challenge. Judging by the number of expected transcripts and assembled unigenes in a transcriptome library, we believe that some unigenes could be reassembled. In this study, using the nitrate transporter (NRT) gene family and phosphate transporter (PHT) gene family in Salicornia europaea as examples, we introduced an approach to further assemble unigenes found in transcriptome libraries which had been previously generated by Trinity. To find the unigenes of a particular transcript that contained gaps, we respectively selected 16 NRT candidate unigene pairs and 12 PHT candidate unigene pairs for which the two unigenes had the same annotations, the same expression patterns among various RNA-seq samples, and different positions of the proteins coded as mapped to a reference protein. To fill a gap between the two unigenes, PCR was performed using primers that mapped to the two unigenes and the PCR products were sequenced, which demonstrated that 5 unigene pairs of NRT and 3 unigene pairs of PHT could be reassembled when the gaps were filled using the corresponding PCR product sequences. This fast and simple method will reduce the redundancy of targeted unigenes and allow acquisition of complete coding sequences (CDS).
Highlights
Whole transcriptome sequencing (RNA-seq) with next-generation sequencing (NGS) technology has been used to uncover the complex landscape and dynamics of transcriptomes in various plant species since the success of the massively parallel pyrosequencing of the Arabidopsis transcriptome (Weber et al, 2007)
We found a total of 118 nitrate transporter (NRT) unigenes, including 75 NRT1 unigenes, 37 NRT2 unigenes, and 6 NRT3 unigenes per Nr annotation or Swiss-Prot annotation, in which only 10 NRT1 unigenes, 1 NRT2 unigene and 5 NRT3 unigenes contained a complete coding sequences (CDS) as indicated by an Open reading frames (ORF) analysis and Blastx alignment with other species in GenBank (Table 2A)
Unigenes for a particular transcript that remain unassembled in transcriptome libraries after performing RNA-seq assembly using present assemblers still exist
Summary
Whole transcriptome sequencing (RNA-seq) with next-generation sequencing (NGS) technology has been used to uncover the complex landscape and dynamics of transcriptomes in various plant species since the success of the massively parallel pyrosequencing of the Arabidopsis transcriptome (Weber et al, 2007). Because of the great depth of sequencing it allows, RNA-seq can produce a nearly complete profile of a transcriptome, even including rare transcripts. RNA-seq has many advantages, such as the base-pair-level resolution, the large range of expression level, and de novo annotation (Martin and Wang, 2011). Compared with the high cost of genome sequencing, using RNA-seq, ordinary laboratories can produce transcriptome sequences for species of interest (Hamilton and Buell, 2012), and more than 50 different plant species have been sequenced using this technology (Schliesky et al, 2012). A reference genome is not available for most species, de
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have