Abstract
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.
Highlights
Profound somatic copy number alternations (SCNAs) are present in many types of tumors [1,2,3], where they affect a larger fraction of the genome than other types of somatic variations [3,4]
This task has its own challenges due to complex nature of tumor SCNA profile and is further complicated by the heterogeneity of the cells collected from a tumor tissue and the contamination from adjacent normal cells, making it difficult for the methods well tailored for the detection of germline copy number variation (CNV) to fit in tumor SCNA detection
In contrast to germline copy number variations (CNVs), which are sparsely distributed along the genome and of small to moderate size, tumor SCNAs are large in size and have a much wider range of magnitudes in copy number
Summary
Profound somatic copy number alternations (SCNAs) are present in many types of tumors [1,2,3], where they affect a larger fraction of the genome than other types of somatic variations [3,4]. The majority of RD-based methods, such as CNV-seq [12], SegSeq [9], ExomeCNV [13] and PatternCNV [14], follows a “bottom-up” procedure: short-reads mapping, normalization of read depth, copy number estimation in a local region (usually in a window of certain size or in an exonic region) and segmentation to merge regions with the same copy number status [10] This strategy is sensitive in the detection of germline CNVs, but for tumor CNVs (i.e. SCNAs), it is difficult for the local inference to correctly decide the baseline of ploidy and accurately discern weak signals of copy number change in the presence of aneuploidy and normal cell contamination, so that the genome-wide inference drawn from the later segmentation step is inclined to accumulate false positive findings from the earlier local inference step. These methods only consider the aggregated depth of sequencing reads carrying paternal and maternal alleles, aimed at estimating the total copy number, but largely ignore allele specific read depth, while the latter contains critical information of copy number change, copyneutral loss of heterozygosity (CN-LOH), and, importantly, genome ploidy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.