Abstract

BackgroundThe advent of next-generation RNA sequencing (RNA-seq) has greatly advanced transcriptomic studies, including system-wide identification and quantification of mRNA isoforms under various biological conditions. A number of computational methods have been developed to systematically identify mRNA isoforms in a high-throughput manner from RNA-seq data. However, a common drawback of these methods is that their identified mRNA isoforms contain a high percentage of false positives, especially for genes with complex splicing structures, e.g., many exons and exon junctions.ResultsWe have developed a preselection method called “Non-negative Matrix Factorization Preselection” (NMFP) which is designed to improve the accuracy of computational methods in identifying mRNA isoforms from RNA-seq data. We demonstrated through simulation and real data studies that NMFP can effectively shrink the search space of isoform candidates and increase the accuracy of two mainstream computational methods, Cufflinks and SLIDE, in their identification of mRNA isoforms.ConclusionNMFP is a useful tool to preselect mRNA isoform candidates for downstream isoform discovery methods. It can greatly reduce the number of isoform candidates while maintaining a good coverage of unknown true isoforms. Adding NMFP as an upstream step, computational methods are expected to achieve better accuracy in identifying mRNA isoforms from RNA-seq data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2304-8) contains supplementary material, which is available to authorized users.

Highlights

  • The advent of next-generation RNA sequencing (RNA-seq) has greatly advanced transcriptomic studies, including system-wide identification and quantification of messenger RNA (mRNA) isoforms under various biological conditions

  • The following results show that negative Matrix Factorization Preselection” (NMFP) can help Cufflinks and SLIDE achieve better isoform discovery accuracy

  • negative matrix factorization (NMF) is inherently capable of providing interpretable results for the problem of mRNA isoform discovery

Read more

Summary

Introduction

The advent of next-generation RNA sequencing (RNA-seq) has greatly advanced transcriptomic studies, including system-wide identification and quantification of mRNA isoforms under various biological conditions. A common drawback of these methods is that their identified mRNA isoforms contain a high percentage of false positives, especially for genes with complex splicing structures, e.g., many exons and exon junctions. More than a decade ago, microarray technologies established a high-throughput platform for identifying and quantifying mRNA isoforms of genes with known sequences. SPACE [3, 4] is a method using non-negative matrix factorization (NMF) to predict mRNA isoforms and estimate their abundance from microarray data. Many open questions remain about how to use NMF to accurately find mRNA isoforms from real microarray data. In many scenarios NMF outputs non-unique factorization results [6], making SPACE identify ambiguous mRNA isoforms. This hinders SPACE from discovering novel isoforms that contain novel exons or exon junctions

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.