Abstract
BackgroundRNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity.MethodTranscriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs.ResultTo take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities.ConclusionOur results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2007-1) contains supplementary material, which is available to authorized users.
Highlights
RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species
Result: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution
To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines Core Eukaryotic Genes Mapping Approach (CEGMA) and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose
Summary
RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. Transcriptome sequencing (RNA-seq) has become a standard strategy to capture the spatiotemporal expression of a genome. The choice of which sequencing mode to use influences the coverage of the transcriptome in de novo sequencing projects targeting sequence discovery, as well as influencing expression profiling in differential gene. For RNA-seq library preparation, there are few that introduce a choice of insert lengths with variable conditions for RNA fragmentation. The standard protocol for Illumina TruSeq RNA Sample Prep Kit recommends intensive RNA fragmentation, which results in a high proportion of library molecules with the middle of their inserts sequenced from both ends. To maximize the potential of obtaining longer reads, it is preferable to prepare libraries with longer inserts using moderate RNA fragmentation
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have