Comprehensive evaluation of RNA-seq analysis pipelines in diploid and polyploid species.

Miriam Payá-Milans,Margaret Staton,Timothy A Rinehart,Gerardo Nunez,James W Olmstead

doi:10.1093/gigascience/giy132

Abstract

BackgroundThe usual analysis of RNA sequencing (RNA-seq) reads is based on an existing reference genome and annotated gene models. However, when a reference for the sequenced species is not available, alternatives include using a reference genome from a related species or reconstructing transcript sequences with de novo assembly. In addition, researchers are faced with many options for RNA-seq data processing and limited information on how their decisions will impact the final outcome. Using both a diploid and polyploid species with a distant reference genome, we have tested the influence of different tools at various steps of a typical RNA-seq analysis workflow on the recovery of useful processed data available for downstream analysis.FindingsAt the preprocessing step, we found error correction has a strong influence on de novo assembly but not on mapping results. After trimming, a greater percentage of reads could be used in downstream analysis by selecting gentle quality trimming performed with Skewer instead of strict quality trimming with Trimmomatic. This availability of reads correlated with size, quality, and completeness of de novo assemblies and with number of mapped reads. When selecting a reference genome from a related species to map reads, outcome was significantly improved when using mapping software tolerant of greater sequence divergence, such as Stampy or GSNAP.ConclusionsThe selection of bioinformatic software tools for RNA-seq data analysis can maximize quality parameters on de novo assemblies and availability of reads in downstream analysis.

Highlights

>The idea of comparing different assembly and mapping strategies is compelling
Since the mappings are already done, you could explore in more detail how multiple homoeologues may be mapping to the same "unigene", or you could try to figure out if the homoeologues are removed/merged into single unigenes
If that is the case, you may be mapping the tetraploid to a reference closer to a diploid

Summary

Introduction

>The idea of comparing different assembly and mapping strategies is compelling. It is true, that there are few resources about the effects of polyplody on tools designed mostly for diploids. It would be nice to have a table with all that information summarized, including one column with a short description of the final effect in each step of the analysis. A row could look like (with more rows, one for each step in the pipeline) >Tool: Trimmomatic >De Novo Assembly: Improves in 5% on VC (or whatever you find) >Mapping to genome: Limited effect.

Results

Conclusion