Abstract

e15511 Background: As one of the most concerning world public health issue, cancer accounted for 21% of all deaths. Early cancer detection methods help reduce the chance of dying from cancer. The fragmentation patterns of plasma circulating cell-free DNA (cfDNA) in tumor patients and healthy people are different, which reflects aberrant gene-regulation in cancer patients. Measure cfDNA fragmentomics not only identify the biomarkers to detect cancer early, but also contribute to revealing the biological regulatory mechanism of cancer. In this study, a novel method, i.e. multi-fragmentomics early tumor detection method (METD) was developed, which combined multiple cfDNA fragmentation features and effectively predicted early stage cancer patients. Methods: Adapter sequences were removed from raw data which are then aligned to hg19 reference genome. Duplicated reads, low quality reads and reads aligned to sex chromosomes were removed. Fragment length were normalized through z-score and GC content were regressed out from each bin using GBM (Gradient Boosting Machine) regression tree. Using data from 100 healthy samples, the baseline were calculated. Six fragmentation features including FSD (fragment size distribution) that is a ratio of short/long fragment numbers, EM (end motif) which represent the frequency of 5’end 2-6bp sequence patterns, BPM (break-point motifs) that is the frequency of 5’ end break-point sequences, TFAS (transcription factor accessibility score) that is the rank of coverage at transcription factor binding sites, CFS (co-fragmentation score) which calculate the first principle component of the free-C correlation matrix, GES (gene expression score) that measures the predicted gene expression from cfDNA lp-WGS data were calculated for each sample. Then these 6 fragmentation features were fitted in a penalized logistic regression for cancer prediction. Data of each sample was also downsampled to 0.1x, 0.5x, 1x, 3x and 5x to test the minimum amount of data needed for this prediction model. Results: In training data set, 208 samples from gastrointestinal cancer patients and 100 samples from healthy individuals, our method METD detect tumor patients with a AUC of 0.98. The AUC is 0.95 when the training data were downsampled to 0.1x. This method was then tested in a validation data set containing 32 cancer patients and 32 healthy individuals, which showed a specificity of 0.9 and sensitivity of 0.88. Conclusions: In this study, we develop a new method for early cancer detection integrating different fragmentomics features and demonstrated that lp-WGS data even with 0.1x data can be used to distinguish cancer and healthy groups. Studies with larger cohorts is warranted to verify the performance of this method in future.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call