Abstract

Cell-free DNA (cfDNA) serves as a footprint of the nucleosome occupancy status of transcription start sites (TSSs), and has been subject to wide development for use in noninvasive health monitoring and disease detection. However, the requirement for high sequencing depth limits its clinical use. Here, we introduce a deep-learning pipeline designed for TSS coverage profiles generated from shallow cfDNA sequencing called the Autoencoder of cfDNA TSS (AECT) coverage profile. AECT outperformed existing single-cell sequencing imputation algorithms in terms of improvements to TSS coverage accuracy and the capture of latent biological features that distinguish sex or tumor status. We built classifiers for the detection of breast and rectal cancer using AECT-imputed shallow sequencing data, and their performance was close to that achieved by high-depth sequencing, suggesting that AECT could provide a broadly applicable noninvasive screening approach with high accuracy and at a moderate cost.

Highlights

  • Plasma cell-free DNA is an intensively investigated biomarker that has been widely used for noninvasive cancer evaluation and prenatal testing [1,2,3,4]

  • GC content bias influences the number of reads that are mapped to a genomic region, confounds the quantification of transcription start sites (TSSs) coverage profiles, and is a major cause of batch effects in cell-free DNA (cfDNA) sequencing data [19]

  • Using random permutations, we found that Autoencoder of cfDNA TSS (AECT) did not increase the median area under the receiver operating characteristic curve (AUROC) levels of randomly assumed sample types (p = 0.140, Wilcoxon rank-sum test, Supplementary Figure 7), suggesting that it captured the particular differences between breast cancer patients and noncancer donors

Read more

Summary

Introduction

Plasma cell-free DNA (cfDNA) is an intensively investigated biomarker that has been widely used for noninvasive cancer evaluation and prenatal testing [1,2,3,4]. CfDNA TSS coverage profiles are informative for biological process and regulatory networks in organisms, and a set of noninvasive cfDNA coverage-based screening methods have been developed for use in the detection of cancer [7,8,9], evaluation of therapeutic effects in cancer, the prediction of pregnancy complications [3, 10], health monitoring in pregnancy [11], and other uses Most of these methods require deep whole-genome sequencing data, which limits its routine clinical usage due to cost [7]. A new approach is needed to balance between the cost of cfDNA sequencing and the accuracy of TSS coverage profiles

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call