Abstract

Aim: Smoking strongly influences DNA methylation, with current and never smokers exhibiting different methylation profiles. Methods: To advance the practical applicability of the smoking-associated methylation signals, we used machine learning methodology to train a classifier for smoking status prediction. Results: We show the prediction performance of our classifier on three independent whole-blood datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Conclusion: The major contribution of our classifier is its global applicability without a need for users to determine a threshold value for each dataset to predict the smoking status. We provide an R package, EpiSmokEr (Epigenetic Smoking status Estimator), facilitating the use of our classifier to predict smoking status in future studies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call