Abstract
Estimating individual age from DNA methylation at age associated CpG sites may provide key information facilitating forensic investigations. Systematic marker screening and feature selection play a critical role in ensuring the performance of the final prediction model. In the discovery stage, we screened for 811876 CpGs from whole blood of 2664 Chinese individuals ranging from 18 to 83years of age based on a stepwise conditional epigenome-wide association study (SCEWAS). The SCEWAS identified 28 CpGs showing genome-wide significant and independent effects. Further restricting this panel to 10 most informative CpGs showed a tolerable loss of information. A linear model consisting of these 10 CpGs could explain 93% of the age variance (R2 = 0.93) in the training set (n = 2664). In an independent test set of Chinese individuals (n = 648), this model also provided highly accurate predictions (R2 = 0.85, mean absolute deviation, MAD = 3.20years). The model was additionally validated in a public dataset of multiple ancestral origins (86 Europeans, 14 Asians, and 273 Africans) and the prediction accuracy reduced significantly (R2 = 0.85, MAD = 6.21years), as might be expected due to different genomic backgrounds, sample sizes, and age ranges. Our 10 CpG model also outperformed the recently proposed 9-CpG model constructed in 390 Chinese males (R2 = 0.79 in test set). We also demonstrated that our SCEWAS approach outperformed the traditional EWAS and the elastic net approach in obtaining a small set of most age informative CpGs. Overall, our systematic genome-wide feature selection identified a small panel of 10 CpGs for accurate age estimation with high potential in forensic applications.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have