Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection.

Laura Oikkonen,Stefano Lise

doi:10.12688/wellcomeopenres.10501.2

Abstract

Identifying variants from RNA-seq (transcriptome sequencing) data is a cost-effective and versatile alternative to whole-genome sequencing. However, current variant callers do not generally behave well with RNA-seq data due to reads encompassing intronic regions. We have developed a software programme called Opossum to address this problem. Opossum pre-processes RNA-seq reads prior to variant calling, and although it has been designed to work specifically with Platypus, it can be used equally well with other variant callers such as GATK HaplotypeCaller. In this work, we show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline. In addition, using it in combination with Platypus offers a substantial reduction in run times compared to the GATK pipeline so it is ideal when there are only limited time or computational resources available.

Highlights

RNA-seq[1] is routinely employed for gene expression analysis, but it can be used to identify genomic variants in expressed regions alongside whole-exome (WES) and whole-genome sequencing (WGS)
We show that using Opossum in conjunction with either Platypus or GATK HaplotypeCaller maintains precision and improves the sensitivity for SNP detection compared to the GATK Best Practices pipeline
A few pipelines for detecting SNPs in RNA-seq data have been released to address these challenges. eSNV-detect by Tang et al.[3] employs a combination of mappers to overcome systematic errors of individual aligners, followed by variant calling with Samtools and Bcftools

Summary

Introduction

RNA-seq (transcriptome sequencing)[1] is routinely employed for gene expression analysis, but it can be used to identify genomic variants in expressed regions alongside whole-exome (WES) and whole-genome sequencing (WGS). The approach works well on length scales of up to a few kilobases (typically up to 1.5–2 kb) but longer reads (e.g. reads mapping across large introns) would disrupt it For this reason Platypus should not be run directly on RNA-seq data. We have developed a software tool called Opossum[6] to process and filter RNA-seq data and make it suitable for (haplotype-based) variant calling. The presence of splice junctions in RNA-seq data means that reads which have been mapped across splice junctions must be split to remove intronic parts which would otherwise disrupt variant calling. Our approach shows promising results, maintaining high precision and improving sensitivity in detecting SNP variant calls compared to the GATK Best Practices pipeline. We have used the strongly validated GIAB (Genome in a Bottle) dataset[10]

Methods

Results

Oikkonen LE

11. ENCODE Project Consortium

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Wellcome Open Research	Publication Date: Mar 17, 2017
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wellcome Open Research

Lead the way for us

Similar Papers

Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection
Laura Oikkonen ... Stefano Lise
Wellcome Open Research | VOL. 2
Laura Oikkonen, et. al.Laura Oikkonen ... Stefano Lise
31 Jan 2017
Wellcome Open Research | VOL. 2

Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection
Laura Oikkonen ... Stefano Lise
Wellcome Open Research | VOL. 2
Laura Oikkonen, et. al.Laura Oikkonen ... Stefano Lise
17 Jan 2017
Wellcome Open Research | VOL. 2

INSP-12. Molecular architecture of the human brain and brain cancer
Sten Linnarsson
Neuro-Oncology | VOL. 24
Sten LinnarssonSten Linnarsson
03 Jun 2022
Neuro-Oncology | VOL. 24

Comparison of GATK and DeepVariant by trio sequencing
Yi-Lin Lin ... Feipei Lai
Scientific Reports | VOL. 12
Yi-Lin Lin, et. al.Yi-Lin Lin ... Feipei Lai
02 Feb 2022
Scientific Reports | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Wellcome Open Research