BackgroundVariation in the timing of menarche has been linked with adverse health outcomes in later life. There is evidence that exposure to hormonally active agents (or endocrine disrupting chemicals; EDCs) during childhood may play a role in accelerating or delaying menarche. The goal of this study was to generate hypotheses on the relationship between exposure to multiple EDCs and timing of menarche by applying a two-stage machine learning approach. MethodsWe used data from the National Health and Nutrition Examination Survey (NHANES) for years 2005–2008. Data were analyzed for 229 female participants 12–16 years of age who had blood and urine biomarker measures of 41 environmental exposures, all with >70% above limit of detection, in seven classes of chemicals. We modeled risk for earlier menarche (<12 years of age vs older) with exposure biomarkers. We applied a two-stage approach consisting of a random forest (RF) to identify important exposure combinations associated with timing of menarche followed by multivariable modified Poisson regression to quantify associations between exposure profiles (“combinations”) and timing of menarche. ResultsRF identified urinary concentrations of monoethylhexyl phthalate (MEHP) as the most important feature in partitioning girls into homogenous subgroups followed by bisphenol A (BPA) and 2,4-dichlorophenol (2,4-DCP). In this first stage, we identified 11 distinct exposure biomarker profiles, containing five different classes of EDCs associated with earlier menarche. MEHP appeared in all 11 exposure biomarker profiles and phenols appeared in five. Using these profiles in the second-stage of analysis, we found a relationship between lower MEHP and earlier menarche (MEHP ≤ 2.36 ng/mL vs >2.36 ng/mL: adjusted PR = 1.36, 95% CI: 1.02, 1.80). Combinations of lower MEHP with benzophenone-3, 2,4-DCP, and BPA had similar associations with earlier menarche, though slightly weaker in those smaller subgroups. For girls not having lower MEHP, exposure profiles included other biomarkers (BPA, enterodiol, monobenzyl phthalate, triclosan, and 1-hydroxypyrene); these showed largely null associations in the second-stage analysis. Adjustment for covariates did not materially change the estimates or CIs of these models. We observed weak or null effect estimates for some exposure biomarker profiles and relevant profiles consisted of no more than two EDCs, possibly due to small sample sizes in subgroups. ConclusionA two-stage approach incorporating machine learning was able to identify interpretable combinations of biomarkers in relation to timing of menarche; these should be further explored in prospective studies. Machine learning methods can serve as a valuable tool to identify patterns within data and generate hypotheses that can be investigated within future, targeted analyses.