Abstract

Deep sequencing of transcriptome (RNA-seq) provides unprecedented opportunity to interrogate plausible mRNA splicing patterns by mapping RNA-seq reads to exon junctions (thereafter junction reads). In most previous studies, exon junctions were detected by using the quantitative information of junction reads. The quantitative criterion (e.g. minimum of two junction reads), although is straightforward and widely used, usually results in high false positive and false negative rates, owning to the complexity of transcriptome. Here, we introduced a new metric, namely Minimal Match on Either Side of exon junction (MMES), to measure the quality of each junction read, and subsequently implemented an empirical statistical model to detect exon junctions. When applied to a large dataset (>200M reads) consisting of mouse brain, liver and muscle mRNA sequences, and using independent transcripts databases as positive control, our method was proved to be considerably more accurate than previous ones, especially for detecting junctions originated from low-abundance transcripts. Our results were also confirmed by real time RT-PCR assay. The MMES metric can be used either in this empirical statistical model or in other more sophisticated classifiers, such as logistic regression.

Highlights

  • Alternative splicing (AS), which invalidates the old theory of ‘‘one gene one protein’’, enables higher eukaryote to produce large number of transcripts with limited number of genes, and has been proposed as a primary driver of the evolution of phenotypic complexity in mammals [1]

  • A junction read with fewer mismatches will have a higher MMES score

  • MMES can give a rough estimate of the positions of mismatches: When a read was divided into ‘‘long arm’’ and ‘‘short arm’’ by the middle point of exon junction, in most cases, mismatches on ‘‘long arm’’ have no effect on MMES score, while mismatches on ‘‘short arm’’ will reduce the MMES score

Read more

Summary

Introduction

Alternative splicing (AS), which invalidates the old theory of ‘‘one gene one protein’’, enables higher eukaryote to produce large number of transcripts with limited number of genes, and has been proposed as a primary driver of the evolution of phenotypic complexity in mammals [1]. Except for the relatively high cost, EST technology has many other limitations including genomic contamination, cloning bias, paralog confusing, 39 gene bias and low sensitivity in detecting low abundance transcripts. It requires great efforts for data interpretation [5]. Whole-transcript microarrays were used to monitor 24,426 alternative splicing events in 48 human tissues and cell lines [8] This technology has been used extensively, limitations still persist; including limited probe coverage, cross-hybridization artifacts, requirement of previously known gene structures and difficulties in data analysis, etc

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.