Abstract
RNA-Seq has become increasingly popular in transcriptome profiling. The major challenge in RNA-Seq data analysis is the accurate mapping of junction reads to their genomic origins. To detect splicing sites in short reads, many RNA-Seq aligners use reference transcriptome to inform placement of junction reads. However, no systematic evaluation has been performed to assess or quantify the benefits of incorporating reference transcriptome in mapping RNA-Seq reads. In this paper, we have studied the impact of reference transcriptome on mapping RNA-Seq reads, especially on junction ones. The same dataset were analysed with and without RefGene transcriptome, respectively. Then a Perl script was developed to analyse and compare the mapping results. It was found that about 50–55% junction reads can be mapped to the same genomic regions regardless of the usage of RefGene model. More than one-third of reads fail to be mapped without the help of a reference transcriptome. For “Alternatively” mapped reads, i.e., those reads mapped differently with and without RefGene model, the mappings without RefGene model are usually worse than their corresponding alignments with RefGene model. For junction reads that span more than two exons, it is less likely to align them correctly without the assistance of reference transcriptome. As the sequencing technology evolves, the read length is becoming longer and longer. When reads become longer, they are more likely to span multiple exons, and thus the mapping of long junction reads is actually becoming more and more challenging without the assistance of reference transcriptome. Therefore, the advantages of using reference transcriptome in the mapping demonstrated in this study are becoming more evident for longer reads. In addition, the effect of the completeness of reference transcriptome on mapping of RNA-Seq reads is discussed.
Highlights
In recent years, RNA-Seq has become a popular and powerful approach for transcriptome profiling [1,2,3,4,5,6]
Short reads generated by RNA-Seq experiments must be aligned, or ‘‘mapped’’ to a reference genome or transcriptome assembly
The number of reads aligned to each feature approximates abundances of those features in the original sample. Such measures of digital gene expression are subject to comparison among samples or treatments in a statistical framework
Summary
RNA-Seq has become a popular and powerful approach for transcriptome profiling [1,2,3,4,5,6]. RNA-Seq has considerable advantages for examining transcriptome fine structure–for example, in the detection of novel transcripts, allelespecific expression, and alternative splicing–and provides a far more precise measurement of levels of transcripts than that of other methods such as microarray [7,8,9,10]. RNA-Seq has a much broader dynamic range than microarray, which allows for the detection of more differentially expressed genes with higher foldchange. RNA-Seq avoids technical issues in microarray related to probe performance such as cross-hybridization, limited detection range of individual probes, and nonspecific hybridization. RNA-Seq is becoming an attractive approach in the profiling of gene expression and in evaluating differential expression [11]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.