Abstract
RNA sequencing experiments generate large amounts of information about expression levels of genes. Although they are mainly used for quantifying expression levels, they contain much more biologically important information such as copy number variants (CNVs). Here, we present CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. CaSpER integrates the multiscale smoothing of expression signal and allelic shift signals for CNV calling. The allelic shift signal measures the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER employs an efficient methodology for the generation of a genome-wide B-allele frequency (BAF) signal profile from the reads and utilizes it for correction of CNVs calls. CaSpER increases the utility of RNA-sequencing datasets and complements other tools for complete characterization and visualization of the genomic and transcriptomic landscape of single cell and bulk RNA sequencing data.
Highlights
RNA sequencing experiments generate large amounts of information about expression levels of genes
In order to eliminate the noise in the initial expression signal profile, CaSpER performs sliding window-based median filtering and computes the N-level multiscale decomposition of the expression signal in multiple window length scales, where N denotes the number of smoothing scales
The states correspond to the copy number variants (CNVs) states; 1: homozygous deletion, 2: heterozygous deletion, 3: neutral, 4: onecopy-amplification, 5: high-copy-amplification
Summary
RNA sequencing experiments generate large amounts of information about expression levels of genes. We present CaSpER, a signal processing approach for identification, visualization, and integrative analysis of focal and large-scale CNV events in multiscale resolution using either bulk or single-cell RNA sequencing data. RNAseq data have been used to identify single nucleotide polymorphisms (SNPs) and short indels[11,12,13] Identification of these variants from RNA-seq data increases the utility of RNA-seq experiments significantly compared to using RNA-seq only for gene expression quantification because researchers can integrate a portion of the genomic landscape of the tumor cells (as much as it is revealed by RNA-seq) with the transcriptomic landscape rather than studying the transcriptomic landscape of the cells alone. CaSpER broadens the number of potential use cases of RNA-seq datasets since CaSpER can use RNA-seq data to probe the CNV landscape of the cells in addition to their transcriptomic landscapes
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have