Abstract

BackgroundAccurately assessing tobacco smoke exposure in early life is important to understanding and preventing childhood asthma. In populations with relatively low smoking exposure, it becomes challenging to characterize the main exposure sources of tobacco smoke exposure. When creating a prediction model of questionnaire items to explain variability in nicotine metabolite concentrations, traditional approaches include using directed acyclic graphs, change-in-estimate procedure, comparing AIC and/or R2 statistics, and stepwise selection. These approaches may be improved upon with machine learning. MethodsUsing the CHILD Cohort Study, we measured urinary concentrations of nicotine biomarkers (cotinine and trans-3’-hydroxycotinine (3HC)) at low detection levels (0.03ng/mL) and questionnaire responses related to smoking and other lifestyle factors. Urine samples were collected along with questionnaires at 3-4 months of age. The concentrations were corrected for specific gravity. Random forest regression was used to assign variable importance scores to questionnaire items based on how well they predicted the urinary cotinine and 3HC concentrations. Combining knowledge from this machine learning technique with traditional model selection strategies, a multivariable linear regression prediction model was selected to assess how well questionnaires explained urinary cotinine and 3HC concentrations. Results76% and 89% of the infants had detectable urinary cotinine and 3HC levels, with the geometric mean levels consistent with light or intermittent second-hand smoke, third-hand smoke or possibly from diet. Final models explained 32% and 41% of cotinine and 3HC concentration variation in our sample (n=2,017). Questions related to prenatal smoking exposure, second-hand smoke, housing, breastfeeding, and socioeconomic factors were most important in predicting urinary concentrations at 3 months. ConclusionsTo facilitate complex exposure assessment in interdisciplinary projects, researchers have more sensitive tools from outside their traditional disciplines to improve models and generate new hypotheses. Variable importance plots are a relatively easy and interpretable way to incorporate machine learning into environmental health research.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call