A computational framework to unify orthogonal information in DNA methylation and copy number aberrations in cell-free DNA for early cancer detection.

Qiang Wei,Chao Jin,Jinliang Xing,Shanshan Guo,Bingshan Li,Xu Guo,Xiaonan Liu,Yan Wang ,Jing An

doi:10.1093/bib/bbac200

Abstract

Cell-free DNA (cfDNA) provides a convenient diagnosis avenue for noninvasive cancer detection. The current methods are focused on identifying circulating tumor DNA (ctDNA)s genomic aberrations, e.g. mutations, copy number aberrations (CNAs) or methylation changes. In this study, we report a new computational method that unifies two orthogonal pieces of information, namely methylation and CNAs, derived from whole-genome bisulfite sequencing (WGBS) data to quantify low tumor content in cfDNA. It implements a Bayes model to enrich ctDNA from WGBS data based on hypomethylation haplotypes, and subsequently, models CNAs for cancer detection. We generated WGBS data in a total of 262 samples, including high-depth (>20×, deduped high mapping quality reads) data in 76 samples with matched triplets (tumor, adjacent normal and cfDNA) and low-depth (~2.5×, deduped high mapping quality reads) data in 186 samples. We identified a total of 54Mb regions of hypomethylation haplotypes for model building, a vast majority of which are not covered in the HumanMethylation450 arrays. We showed that our model is able to substantially enrich ctDNA reads (tens of folds), with clearly elevated CNAs that faithfully match the CNAs in the paired tumor samples. In the 19 hepatocellular carcinoma cfDNA samples, the estimated enrichment is as high as 16 fold, and in the simulation data, it can achieve over 30-fold enrichment for a ctDNA level of 0.5% with a sequencing depth of 600×. We also found that these hypomethylation regions are also shared among many cancer types, thus demonstrating the potential of our framework for pancancer early detection.

Full Text