Yanagi: Fast and interpretable segment-based alternative splicing and gene expression analysis

Mohamed K Gunady,Stephen M Mount,Héctor Corrada Bravo

doi:10.1186/s12859-019-2947-6

Mohamed K Gunady, Stephen M Mount + Show 1 more

Open Access

https://doi.org/10.1186/s12859-019-2947-6

Copy DOI

Journal: BMC bioinformatics	Publication Date: Aug 13, 2019
Citations: 3	License type: open-access

Affiliation: University of Maryland, College Park

Abstract

BackgroundUltra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step.ResultsIn this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses – alternative splicing and gene differential expression – without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations.ConclusionsThe transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses.

Highlights

Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses
In this paper we have formalized the concept of transcriptome segmentation and proposed an efficient algorithm for generating segment libraries from transcript libraries based on a length parameter L
The resulting segment sequences are used with pseudo-alignment tools to quantify expression at the segment level, providing sufficient information for a variety of expression analyses

Summary

Introduction

Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Much effort in the area has been dedicated to the problem of efficient alignment, or pseudo-alignment, of reads to a genome or a transcriptome, since this is typically a significant computational bottleneck in the analytical process starting from RNA-seq reads to produce genelevel expression or differentially expressed transcripts Among these approaches are alignment techniques such as Bowtie [1], Tophat [2, 3], and Cufflinks [4], and newer techniques such as sailfish [5], RapMap [6], Kallisto [7] and Salmon [8], which provide efficient strategies through k-mer counting that are much faster, but maintain comparable, or superior, accuracy.

Methods

Results

Discussion

Conclusion