Aims:Understanding how substance use and sexual risk behaviors are related to sexually transmitted infections (STIs) may help to target HIV risk reduction interventions for substance users. Random forests, a machine learning technique, provide a principled approach to explore a largenumberof effects including interactions to identify replicable sets of predictive factors. Methods:Weused data from Project Aware, a randomized clinical trial conducted among 5012 patients in 9 sexually transmitted disease clinics in the US. Predictive models for prevalence and incidence of sexually transmitted infections (STIs) were created. Substance use, sexual risk behaviors, characteristics of sexual networkswere assessed and examined using a random forestmachine learning approach. Results: A total of 48 types of sexual acts and 36 types of substance use behaviors were included in the model. Overall, 30.6% of the participants reported weekly drug use, 6.1% were injection drug users, and 16.3% reported binge drinking in the last 6 months. 24.8% reported DAST-10>3. Results showed that large numbers of predictors (80–90) were useful in predicting STI with about 30% of predictors being sexual risk behaviors and 20% of predictors being substance use indicators. Interactions of these two classes of predictors were evident. High accuracy in predictions (70%) was achieved. Conclusions:These results provide initial support for use of random forests to predict STI. A challenge with these methods is the lack of statistical test for significance of individual variables; nevertheless, these methods are useful for exploratory, model-building in substance abuse research. Financial support: RC2 DA028973 and R21 DA038641.