Abstract

RNA sequencing has become the standard technique for high resolution genome-wide monitoring of gene expression. As such, it often comprises the first step towards understanding complex molecular mechanisms driving various phenotypes, spanning organ development to disease genesis, monitoring and progression. An advantage of RNA sequencing is its ability to capture complex transcriptomic events such as alternative splicing which results in alternate isoform abundance. At the same time, this advantage remains algorithmically and computationally challenging, especially with the emergence of even higher resolution technologies such as single-cell RNA sequencing. Although several algorithms have been proposed for the effective detection of differential isoform expression from RNA-Seq data, no widely accepted golden standards have been established. This fact is further compounded by the significant differences in the output of different algorithms when applied on the same data. In addition, many of the proposed algorithms remain scarce and poorly maintained. Driven by these challenges, we developed a novel integrative approach that effectively combines the most widely used algorithms for differential transcript and isoform analysis using state-of-the-art machine learning techniques. We demonstrate its usability by applying it on simulated data based on several organisms, and using several performance metrics; we conclude that our strategy outperforms the application of the individual algorithms. Finally, our approach is implemented as an R Shiny application, with the underlying data analysis pipelines also available as docker containers.

Highlights

  • Introduction published maps and institutional affilA common application of next-generation sequencing (NGS) is RNA-sequencing (RNA-Seq), which is the genome-wide monitoring of gene expression across organisms, tissues, and lately, single cells [1]

  • We recently showed evidence that PANDORA is insusceptible to biases introduced by different normalization frameworks, showed improved performance when applied to low count RNA molecules, such as lncRNAs and was able to control the propagation of gene length bias into downstream enrichment analysis [20]

  • In order to effectively combine the outputs of several differential isoform expression analysis (DIEA) algorithms and integrate the outcome of each individual tool towards a joint solution reflecting the advantages of each algorithm as well as mitigating the respective disadvantages, we developed an integration workflow based on established top-performing machine learning (ML) methodologies and tested it on simulated data

Read more

Summary

Introduction

A common application of next-generation sequencing (NGS) is RNA-sequencing (RNA-Seq), which is the genome-wide monitoring of gene expression across organisms, tissues, and lately, single cells [1]. Due to its higher signal resolution, lower amounts of required input material, wider dynamic range of measurements [2] and greater range of applications, RNA-Seq has become the standard technique for such studies. The continuously dropping sequencing costs and the emergence of new technologies and less costly protocols such as Quant-Seq 30 mRNA sequencing [3], have contributed to an ever-expanding landscape of data generation in order to characterize gene expression changes across tissues, organs, biological conditions and whole organisms. RNA-Seq has been deployed for a plethora of applications including but not limited to de novo transcriptome assembly [4], investigation of alternative splicing [5], discovery of fused transcripts [6], interrogation of allele-specific gene expression [7], genetic variant iations.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call