Abstract

Stochastic epigenetic mutations (SEMs) have been proposed as novel aging biomarkers to capture heterogeneity in age-related DNA methylation changes. SEMs are defined as outlier methylation patterns at cytosine-guanine dinucleotide sites, categorized as hypermethylated (hyperSEM) or hypomethylated (hypoSEM) relative to a reference. Because SEMs are defined by their outlier status, it is critical to differentiate extreme values due to technical noise or data artifacts from those due to real biology. Using technical replicate data, we found SEM detection is not reliable: across 3 datasets, 24 to 39% of hypoSEM and 46 to 67% of hyperSEM are not shared between replicates. We identified factors influencing SEM reliability-including blood cell type composition, probe beta-value statistics, genomic location, and presence of SNPs. We used these factors in a training dataset to build a machine learning-based filter that removes unreliable SEMs, and found this filter enhances reliability in two independent validation datasets. We assessed associations between SEM loads and aging phenotypes in the Framingham Heart Study and discovered that associations with aging outcomes were in large part driven by hypoSEMs at baseline methylated probes and hyperSEMs at baseline unmethylated probes, which are the same subsets that demonstrate highest technical reliability. These aging associations were preserved after filtering out unreliable SEMs and were enhanced after adjusting for blood cell composition. Finally, we utilized these insights to formulate best practices for SEM detection and introduce a novel R package, SEMdetectR, which uses parallel programming for efficient SEM detection with comprehensive options for detection, filtering, and analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call