Abstract

Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele.

Highlights

  • RNA and DNA carry and present genetic variation in related yet distinct manners; the differences encoding information about functional and structural traits

  • We excluded from further analysis 294 chromosomal segments where either tumor exome or transcriptome had vPR > = 0.58 but their variant allele frequency (VAF) distribution could not be differentiated from the model VAF distributions with vPR = 0.5 (p > 1e−5, Kolmogorov Smirnov test, equivalent to Bonferroni family-wise error rate (FWER) correction for 100,000 comparisons)

  • In the remaining 2,403 chromosomal segments, we systematically examined the similarity between corresponding variant allele frequency in tumor exome sequence (VAFTEX), variant allele frequency in tumor transcriptome sequence (VAFTTR), and copy number alterations (CNA)

Read more

Summary

Introduction

RNA and DNA carry and present genetic variation in related yet distinct manners; the differences encoding information about functional and structural traits. RNA-DNA allele comparisons from sequencing have mostly been approached at the nucleotide level, where they have proven to be highly informative for determining the allelic functional consequences (ENCODE Project Consortium, 2012; Ha et al, 2012; Shah et al, 2012; Morin et al, 2013; Han et al, 2015; Ferreira et al, 2016; Macaulay et al, 2016; Movassagh et al, 2016; Reuter et al, 2016; Shi et al, 2016; Shlien et al, 2016; Yang et al, 2016). Integration of allele signals at the molecular level, as derived from linear DNA and RNA, is less comprehensively explored due to the challenges presented by limited compatibility of the outputs from the two sequencing assays.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.