Abstract
Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.
Highlights
The advent of generation sequencing technologies has revolutionized both basic science and medicine; comprehensive understanding of genomic and transcriptomic variants provides clues to novel biological mechanisms and molecular basis of complex diseases [1,2]
We have provided details of parameters used for developing the expressed single nucleotide variants (eSNVs)-Detect method and have presented a variety of analyses to show the accuracy of variant calling from non-directional paired-end RNA-Seq data
To determine sensitivity and specificity of the eSNV-Detect method, we have investigated the variants from a set of 25 The Cancer Genome Atlas (TCGA) ER+ breast tumors for which we have RNA and exome sequencing datasets along with a 1000 genome individual for whom we have RNA-Seq and single-nucleotide polymorphisms (SNP) chip dataset
Summary
The advent of generation sequencing technologies has revolutionized both basic science and medicine; comprehensive understanding of genomic and transcriptomic variants provides clues to novel biological mechanisms and molecular basis of complex diseases [1,2]. Discovery of single nucleotide variants (SNVs) from genomics and transcriptomics plays a significant role for treatment of disease [3]. Germline and somatic SNVs from genomics and transcriptomics sequencing studies in cancers have allowed us to define mutational landscape of tumors [4,5,6,7]. Genomic features such as gene expression, transcript expression, novel isoforms, fusion transcripts, expressed single nucleotide variants (eSNVs), circular RNAs, non-coding RNAs (long non-coding RNAs and small RNAs) can be obtained from RNA-Seq data [8].
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have