Alignment and mapping methodology influence transcript abundance estimation

Avi Srivastava,Laraib Malik,Charlotte Soneson,Rob Patro,Carl Kingsford,Mohsen Zakeri,Michael I Love,Hirak Sarkar,Fatemeh Almodaresi

doi:10.1186/s13059-020-02151-8

Abstract

BackgroundThe accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy.ResultsWe investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment.ConclusionWe observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly, and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.

Highlights

The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted
We chose STAR, in particular, since it has the ability to project the aligned reads to transcriptomic coordinates, which allowed us to use a consistent quantification method, and because it is part of the popular STAR [19]/RSEM [11] transcript abundance estimation pipeline
We proposed and benchmarked a new hybrid alignment method, selective alignment (SA), which provides an efficient alternative to lightweight mapping that produces results much closer to what is obtained by performing traditional alignment

Summary

Introduction

The accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. Popular methods for transcript quantification [4,5,6,7, 11, 13,14,15,16] differ in many aspects, ranging from how they handle read mapping and alignment, to the optimization algorithms they employ, to differences in their generative models or which biases they attempt to model and correct These differences are often obscured when analyzing simulated data, since aspects of experimental data that can lead to substantial divergence in quantification estimates are not always properly recapitulated in simulations

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology	Publication Date: Sep 7, 2020
Citations: 117	License type: open-access

R Discovery Prime

R Discovery Prime

Alignment and mapping methodology influence transcript abundance estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data
Tianyu Wang ... Sheida Nabavi
BMC Bioinformatics | VOL. 20
Tianyu Wang, et. al.Tianyu Wang ... Sheida Nabavi
18 Jan 2019
BMC Bioinformatics | VOL. 20

Abstract 2172: Earth mover's distance for the identification of genes associated with drug resistance in cancer
Sheida Nabavi ... Andrew H Beck
Cancer Research | VOL. 75
Sheida Nabavi, et. al.Sheida Nabavi ... Andrew H Beck
01 Aug 2015
Abstract 2172: Earth mover's distance for the identification of genes associated with drug resistance in cancer
Sheida Nabavi ... Andrew H Beck

Nonlinear ridge regression improves cell-type-specific differential expression analysis
Fumihiko Takeuchi ... Norihiro Kato
BMC Bioinformatics | VOL. 22
Fumihiko Takeuchi, et. al.Fumihiko Takeuchi ... Norihiro Kato
22 Mar 2021
BMC Bioinformatics | VOL. 22

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies
Xiaohong Li ... Eric C Rouchka
BMC Genomics | VOL. 21
Xiaohong Li, et. al.Xiaohong Li ... Eric C Rouchka
28 Jan 2020
BMC Genomics | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alignment and mapping methodology influence transcript abundance estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology