A Pipeline for Reconstructing Somatic Copy Number Alternation's Subclonal Population-Based Next-Generation Sequencing Data.

Yanshuo Chu,Yadong Wang,Chenxi Nie

doi:10.3389/fgene.2019.01374

Yanshuo Chu, Yadong Wang + Show 1 more

Open Access

PDF Available

https://doi.org/10.3389/fgene.2019.01374

Copy DOI

Export

Save

Cite

Journal: Frontiers in genetics	Publication Date: Feb 27, 2020
Citations: 1	License type: CC BY 4.0

Affiliation: Harbin Institute of Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

State-of-the-art next-generation sequencing (NGS)-based subclonal reconstruction methods perform poorly on somatic copy number alternations (SCNAs), due to not only it needs to simultaneously estimate the subclonal population frequency and the absolute copy number for each SCNA, but also there exist complex bias and noise in the tumor and its paired normal sequencing data. Both existing NGS-based SCNA detection methods and SCNA’s subclonal population frequency inferring tools use the read count on radio (RCR) of tumor to its paired normal as the key feature of tumor sequencing data; however, the sequencing error and bias have great impact on RCR, which leads to a large number of redundant SCNA segments that make the subsequent process of SCNA’s subclonal population frequency inferring and subclonal reconstruction time-consuming and inaccurate. We perform a mathematical analysis of the solution number of SCNA’s subclonal frequency, and we propose a computational algorithm to reduce the impact of false breakpoints based on it. We construct a new probability model that incorporates the RCR bias correction algorithm, and by stringing it with the false breakpoint filtering algorithm, we construct a whole SCNA’s subclonal population reconstruction pipeline. The experimental result shows that our pipeline outperforms the existing subclonal reconstruction programs both on simulated data and TCGA data. Source code is publicly available as a Python package at https://github.com/dustincys/msphy-SCNAClonal.

Highlights

Tumor heterogeneity introduces challenges in cancer tissue diagnosis and subsequent treatment (Nowell, 1976)
We filter out the false positive breakpoints by the algorithm we proposed in this paper, we use the probability model of subclonal population frequency proposed in this paper to infer the subclonal frequency of each somatic copy number alternations (SCNAs) segment
Based on the mathematical analysis, we propose an algorithm to filter out the false breakpoints and we construct a new probability model to reconstruct SCNA’s subclonal population, which incorporates the algorithms of read count on radio (RCR) bias correction we previously proposed

Summary

Introduction

Tumor heterogeneity introduces challenges in cancer tissue diagnosis and subsequent treatment (Nowell, 1976). To decipher cell composition in bulk cells, somatic copy number alternations (SCNAs), most commonly found in tumor cells (Beroukhim et al, 2010), are utilized as the representative to determine tumor subclonal populations in a tumor–normal tissue paired manner (Oesper et al, 2013; Li and Xie, 2015). The benefit of using SCNA to conduct subclonal reconstruction is that the WGS data doesn’t have to be deeply sequenced (Li and Xie, 2015), because SCNA affects large, multi-kilobase-sized or megabase-sized regions of the genome, which allows the average copy number of these regions to be accurately estimated with whole genome sequencing (WGS) (Deshwar et al, 2015). ISA is the commonly accepted and powerful assumption, which posits that each mutation occurs only once in the evolutionary history of the tumor

Methods

Results

Conclusion