Abstract

Long-read RNA sequencing (RNA-seq) technologies can sequence full-length transcripts, facilitating the exploration of isoform-specific gene expression over short-read RNA-seq. We present LIQA to quantify isoform expression and detect differential alternative splicing (DAS) events using long-read direct mRNA sequencing or cDNA sequencing data. LIQA incorporates base pair quality score and isoform-specific read length information in a survival model to assign different weights across reads, and uses an expectation-maximization algorithm for parameter estimation. We apply LIQA to long-read RNA-seq data from the Universal Human Reference, acute myeloid leukemia, and esophageal squamous epithelial cells and demonstrate its high accuracy in profiling alternative splicing events.

Highlights

  • RNA splicing is a major mechanism for generating transcriptomic variations, and misregulation of splicing is associated with a large array of human diseases caused by hereditary and somatic mutations [1,2,3,4,5]

  • Due to limited read length, it is difficult to accurately characterize transcripts using short reads, as 81% of isoforms have length greater than 500 bp in the GENCODE annotation. This fragmented sequencing of the RNA/cDNA molecules results in biases and has become a barrier for short reads to be correctly mapped to the reference genome, which is crucial for gene or isoform expression estimation and novel or unique isoform detection

  • Given that isoform origins are unobserved for some reads, an expectation maximization (EM) algorithm is utilized to achieve the optimal solution of isoform relative abundance estimation

Read more

Summary

Introduction

RNA splicing is a major mechanism for generating transcriptomic variations, and misregulation of splicing is associated with a large array of human diseases caused by hereditary and somatic mutations [1,2,3,4,5]. Due to limited read length, it is difficult to accurately characterize transcripts using short reads, as 81% of isoforms have length greater than 500 bp in the GENCODE annotation (median = 1543 bp and mean = 2121 bp). This fragmented sequencing of the RNA/cDNA molecules results in biases and has become a barrier for short reads to be correctly mapped to the reference genome, which is crucial for gene or isoform expression estimation and novel or unique isoform detection. A number of computational tools, including RSEM [9], eXpress [10], TIGAR 2[11], Salmon [12], Sailfish

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.