Abstract

A validated, scalable approach to characterizing (phenotyping) smoking status is needed to facilitate genetic discovery. Using established DNA methylation sites from blood samples as a criterion standard for smoking behavior, we compare three candidate electronic medical record (EMR) smoking metrics based on longitudinal EMR text notes. With data from the Veterans Aging Cohort Study (VACS), we employed a validated algorithm to translate each smoking-related text note into current, past or never categories. We compared three alternative summary characterizations of smoking: most recent, modal and trajectories using descriptive statistics and Spearman's correlation coefficients. Logistic regression and area under the curve analyses were used to compare the associations of these phenotypes with the DNA methylation sites, cg05575921 and cg03636183, which are known to have strong associations with current smoking. DNA methylation data were available from the VACS Biomarker Cohort (VACS-BC), a sub-study of VACS. We also considered whether the associations differed by the certainty of trajectory group assignment (<0.80/≥0.80). Among 140152 VACS participants, EMR summary smoking phenotypes varied in frequency by the metric chosen: current from 33 to 53 percent; past from 16 to 24 percent and never from 24 to 33 percent. The association between the EMR smoking pairs was highest for modal and trajectories (rho=0.89). Among 728 individuals in the VACS-BC, both DNA methylation sites were associated with all three EMR summary metrics (p<0.001), but the strongest association with both methylation sites was observed for trajectories (p<0.001). Longitudinal EMR smoking data support using a summary phenotype, the validity of which is enhanced when data are integrated into statistical trajectories.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call