Abstract Introduction: Cell-free DNA (cfDNA) is emerging as a valuable cancer biomarker, enabling both the detection and epigenetic characterization of malignancies through sequencing. cfDNA sequencing can reveal cancer-specific nucleosome occupancy patterns, which provide insights into tumor type and differentiation status, which define prognosis and therapy recommendations. Low-coverage whole-genome cfDNA sequencing is a cost-efficient approach to assay multiple cfDNA samples, however, the low coverage limits its sensitivity. We developed NucAE, a denoising autoencoder model, to enhance nucleosome occupancy signals from low-coverage cfDNA-sequencing. Methods: We designed NucAE to estimate nucleosome occupancy genome- wide from low-coverage sequencing data. To train the model, we undersampled high-coverage cfDNA-sequencing data and produced high- and low-coverage sample pairs at varying coverages (1x-90x). Inputs to NucAE include cfDNA fragment data and the corresponding genomic nucleotide sequence, with the model leveraging a symmetrical architecture comprising two convolutional layers in both the encoder and decoder, connected by ReLU activations. Mean squared error relative to the high-coverage signal was employed as the loss function. To evaluate the model independently, we performed peak detection from the nucleosome occupancy signals. We conducted a grid search to optimize hyperparameters. Results: We found that NucAE enhances the signal to noise ratio by detecting periodicities in nucleosome occupancy data. Denoising was improved by supplementing the cfDNA fragment signals with the genomic nucleotide sequences as input, which demonstrates that the model learned the nucleotide- sequence dependence of nucleosome positioning. Independent validation through peak detection on previously unseen samples demonstrated a significant improvement (p<0.001) in accuracy compared to raw signals across all tested coverages (1x-45x). Remarkably, at extremely low coverages (1x-2x), denoised signals allowed for more accurate peak detection than raw signals from 10-20x deeper sequencing. Conclusions: By using advanced machine-learning, we were able to enhance low-coverage cfDNA sequencing data to assay nucleosome positions at a nucleotide resolution. NucAE achieves an order of magnitude improvement over the low- coverage signals and greatly improves the utility of cfDNA-sequencing data to identify cancer- specific nucleosome occupancy, while preserving its low cost. Our model is the first to use autoencoders for genome-wide signal enhancement with potential use cases in many other areas of genomics. Citation Format: Zsolt Balázs, Jean Radig, Noah Wolford, Manuel Schürch, Michael Krauthammer. NucAE: Autoencoder-based enhancement of nucleosome occupancy signals from low-coverage cfDNA sequencing [abstract]. In: Proceedings of the AACR Special Conference: Liquid Biopsy: From Discovery to Clinical Implementation; 2024 Nov 13-16; San Diego, CA. Philadelphia (PA): AACR; Clin Cancer Res 2024;30(21_Suppl):Abstract nr B053.
Read full abstract