Abstract

Genetic heterogeneity in a mixed sample of tumor and normal DNA can confound characterization of the tumor genome. Numerous computational methods have been proposed to detect aberrations in DNA samples from tumor and normal tissue mixtures. Most of these require tumor purities to be at least 10–15%. Here, we present a statistical model to capture information, contained in the individual's germline haplotypes, about expected patterns in the B allele frequencies from SNP microarrays while fully modeling their magnitude, the first such model for SNP microarray data. Our model consists of a pair of hidden Markov models—one for the germline and one for the tumor genome—which, conditional on the observed array data and patterns of population haplotype variation, have a dependence structure induced by the relative imbalance of an individual's inherited haplotypes. Together, these hidden Markov models offer a powerful approach for dealing with mixtures of DNA where the main component represents the germline, thus suggesting natural applications for the characterization of primary clones when stromal contamination is extremely high, and for identifying lesions in rare subclones of a tumor when tumor purity is sufficient to characterize the primary lesions. Our joint model for germline haplotypes and acquired DNA aberration is flexible, allowing a large number of chromosomal alterations, including balanced and imbalanced losses and gains, copy-neutral loss-of-heterozygosity (LOH) and tetraploidy. We found our model (which we term J-LOH) to be superior for localizing rare aberrations in a simulated 3% mixture sample. More generally, our model provides a framework for full integration of the germline and tumor genomes to deal more effectively with missing or uncertain features, and thus extract maximal information from difficult scenarios where existing methods fail.

Highlights

  • Identification of DNA copy number aberrations and loss of heterozygosity (LOH) in known or potential cancer-related genomic regions offers the potential for application in basic or translational science

  • Inference of aberrations present in the DNA from heterogeneous mixtures of cells requires intermediate data features from single-nucleotide polymorphism (SNP) arrays, i.e. the B allele frequency (BAF, the proportion of the ‘‘B’’ allele in the sample) and log R ratio (LRR, indicative of total copy number), since genotype calls alone may be unaffected by the presence of a small proportion of aberrant cells

  • Methods for detecting chromosomal aberrations that result in allelic imbalance within a heterogeneous sample have previously been proposed that use the dispersion of within-sample allele frequencies measured at germline heterozygous positions

Read more

Summary

Introduction

Identification of DNA copy number aberrations and loss of heterozygosity (LOH) in known or potential cancer-related genomic regions offers the potential for application in basic or translational science. Due to limits of tissue dissection, or when dissection is impractical (e.g. high vascularity or hematological cancers), a DNA sample may exhibit genetic heterogeneity resulting from the mixture of tumor and normal tissues or from subclonal structure. In such cases the task of fully characterizing the genomes present in individual tissues or clones becomes difficult. Inference of aberrations present in the DNA from heterogeneous mixtures of cells requires intermediate data features from SNP arrays, i.e. the B allele frequency (BAF, the proportion of the ‘‘B’’ allele in the sample) and log R ratio (LRR, indicative of total copy number), since genotype calls alone may be unaffected by the presence of a small proportion of aberrant cells. The main strategies for accommodating this BAF pattern are use of a twocomponent mixture distribution and mirroring

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.