Reliable detection of stochastic epigenetic mutations and associations with cardiovascular aging.

Yaroslav Markov,Morgan Levine,Albert T Higgins-Chen

doi:10.1007/s11357-024-01191-3

Abstract

Stochastic epigenetic mutations (SEMs) have been proposed as novel aging biomarkers to capture heterogeneity in age-related DNA methylation changes. SEMs are defined as outlier methylation patterns at cytosine-guanine dinucleotide sites, categorized as hypermethylated (hyperSEM) or hypomethylated (hypoSEM) relative to a reference. Because SEMs are defined by their outlier status, it is critical to differentiate extreme values due to technical noise or data artifacts from those due to real biology. Using technical replicate data, we found SEM detection is not reliable: across 3 datasets, 24 to 39% of hypoSEM and 46 to 67% of hyperSEM are not shared between replicates. We identified factors influencing SEM reliability-including blood cell type composition, probe beta-value statistics, genomic location, and presence of SNPs. We used these factors in a training dataset to build a machine learning-based filter that removes unreliable SEMs, and found this filter enhances reliability in two independent validation datasets. We assessed associations between SEM loads and aging phenotypes in the Framingham Heart Study and discovered that associations with aging outcomes were in large part driven by hypoSEMs at baseline methylated probes and hyperSEMs at baseline unmethylated probes, which are the same subsets that demonstrate highest technical reliability. These aging associations were preserved after filtering out unreliable SEMs and were enhanced after adjusting for blood cell composition. Finally, we utilized these insights to formulate best practices for SEM detection and introduce a novel R package, SEMdetectR, which uses parallel programming for efficient SEM detection with comprehensive options for detection, filtering, and analysis.

Full Text