Abstract
Inference of absolute copy numbers in tumor genomes is one of the key points in the study of tumor genesis. However, the mixture of tumor and normal cells poses a big challenge to this task. Accurate estimation of tumor purity (i.e., the fraction of tumor cells) is a necessary step to solve this problem. In this paper, we propose a new approach, AITAC, to accurately infer tumor purity and absolute copy numbers in a tumor sample by using high-throughput sequencing (HTS) data. In contrast to many existing algorithms for estimating tumor purity, which usually rely on pre-detected mutation genotypes (heterogeneity and homogeneity), AITAC just requires read depths (RDs) observed at the regions with copy number losses. AITAC creates a non-linear model to correlate tumor purity, observed and expected RDs. It adopts an exhaustive search strategy to scan tumor purity in a wide range, and chooses the tumor purity that minimizes the deviation between observed RDs and expected ones as the optimal solution. We apply the proposed approach to both simulation and real sequencing data sets and demonstrate its performance by comparing with two classical approaches. AITAC is freely available at https://github.com/BDanalysis/aitac and can be expected to become a useful approach for researchers to analyze copy numbers in cancer genome.
Highlights
DNA copy number variation (CNV), as a type of structural variations, accounts for the majority of genomic mutations in cancers (McCarroll and Altshuler, 2007)
Simulation study is a reasonable way to evaluate the performance of computational algorithms in CNV detection (Yuan et al, 2017)
We proposed a new method, AITAC, for the inference of tumor purity and absolute copy numbers by using high-throughput sequencing (HTS) data
Summary
DNA copy number variation (CNV), as a type of structural variations, accounts for the majority of genomic mutations in cancers (McCarroll and Altshuler, 2007). Contamination of normal cells in tumor tissues makes the observed magnitudes of signals in CNV regions diminished, which will lead to a decreased power in the detection of genomic mutations if the fraction is unknown (Yuan et al, 2012, 2017, 2018a). The contamination of normal cells in tumor tissues can lead to adverse effects on subsequent genomic analysis, and further poses effects on patient’s condition analysis in clinical practice.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have