Abstract The abnormal changes in DNA methylation are linked to the early stages of carcinogenesis. Identifying these epigenetic changes in circulating tumor DNA (ctDNA) can reveal potential biomarkers for the early diagnosis of various cancers. However, analyzing such data poses bioinformatics challenges due to the lack of sensitivity in detecting the low abundance ctDNA signals in biopsy samples, which are often overwhelmed by the complexity of libraries containing hundreds of targeted regions. Read-level methylation analysis holds the promise of more in-depth DNA methylation detection due to the wide coverage and high sensitivity of rare signals. However, this approach is hindered by the absence of a standardized workflow capable of generating interpretable reports suitable for both bench scientists and professional bioinformaticians. Here, we present a bioinformatics workflow that examines next-generation sequencing (NGS) data and characterizes the read-level methylation patterns of amplicons. Compared to other currently available tools, our method is designed to work with high-multiplex, large-scale targeted assays. It effectively eliminates the undesired noise derived from sequencing byproducts such as false CpG calls, dimers, and off-target alignments. Additionally, to accommodate the substantial volume of data generated by state-of-the-art NGS platforms, the workflow enables parallel processing of samples compatible with both cloud-based and on-premises computing resources. This workflow provides a comprehensive per-sample visualization of DNA methylation patterns and reports read-level methylation results in a “pattern-as-a-feature” table. In this table, the occurrence of an amplicon epiallelic haplotype (pattern) for every sample is represented as a “feature column” and is aggregated with all patterns discovered in the experiment. These read-level patterns, along with other information, can be used to develop machine learning algorithms to reiteratively harvest true predictive features and penalize confounding signals in predicting cancer diagnosis. Citation Format: Mingda Jin, Masatomo Kaneko, Steven Cen, Hongtao Li, Wei Guo, Xinyi Zhou, Atsuko Fujihara, Tsuyoshi Iwata, Lorenzo Storino Ramacciotti, Divyangi Paralkar, Giovanni E Cacciamani, Manju Aron, Osamu Ukimura, Inderbir S. Gill, Gangning Liang, Andre L. Abreu, Jeffrey Bhasin, Xiaojing Yang, Xi-Yu Jia. Read-level methylation pattern extraction for high-multiplex large-scale targeted NGS assay [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3493.
Read full abstract