The nuclear androgen receptor (AR) is one of the most relevant biological targets of Endocrine Disrupting Chemicals (EDCs), which produce adverse effects by interfering with hormonal regulation and endocrine system functioning. This paper describes novel in silico models to identify organic AR modulators in the context of the Collaborative Modeling Project of Androgen Receptor Activity (CoMPARA), coordinated by the National Center of Computational Toxicology (U.S. Environmental Protection Agency). The collaborative project involved 35 international research groups to prioritize the experimental tests of approximatively 40k compounds, based on the predictions provided by each participant. In this paper, we describe our machine learning approach to predict the binding to AR, which is based on a consensus of a multivariate Bernoulli Naive Bayes, a Random Forest, and N-Nearest Neighbor classification models. The approach was developed in compliance with the Organization of Economic Cooperation and Development (OECD) principles, trained on 1687 ToxCast molecules classified according to 11 in vitro assays, and further validated on a set of 3,882 external compounds. The models provided robust and reliable predictions and were used to gather novel data-driven insights on the structural features related to AR binding, agonism, and antagonism.
Read full abstract