The U. S. Environmental Protection Agency's Endocrine Disruptor Screening Program (EDSP) Tier 1 assays are used to screen for potential endocrine system-disrupting chemicals. A model integrating data from 16 high-throughput screening assays to predict estrogen receptor (ER) agonism has been proposed as an alternative to some low-throughput Tier 1 assays. Later work demonstrated that as few as four assays could replicate the ER agonism predictions from the full model with 98% sensitivity and 92% specificity. The current study utilized chemical clustering to illustrate the coverage of the EDSP Universe of Chemicals (UoC) tested in the existing ER pathway models and to investigate the utility of chemical clustering to evaluate the screening approach using an existing 4-assay model as a test case. Although the full original assay battery is no longer available, the demonstrated contribution of chemical clustering is broadly applicable to assay sets, chemical inventories, and models, and the data analysis used can also be applied to future evaluation of minimal assay models for consideration in screening. Chemical structures were collected for 6,947 substances via the CompTox Chemicals Dashboard from the over 10,000 UoC and grouped based on structural similarity, generating 826 chemical clusters. Of the 1,812 substances run in the original ER model, 1,730 substances had a single, clearly defined structure. The ER model chemicals with a clearly defined structure that were not present in the EDSP UoC were assigned to chemical clusters using a k-nearest neighbors approach, resulting in 557 EDSP UoC clusters containing at least one ER model chemical. Performance of an existing 4-assay model in comparison with the existing full ER agonist model was analyzed as related to chemical clustering. This was a case study, and a similar analysis can be performed with any subset model in which the same chemicals (or subset of chemicals) are screened. Of the 365 clusters containing >1ER model chemical, 321 did not have any chemicals predicted to be agonists by the full ER agonist model. The best 4-assay subset ER agonist model disagreed with the full ER agonist model by predicting agonist activity for 122 chemicals from 91 of the 321 clusters. There were 44 clusters with at least two chemicals and at least one agonist based upon the full ER agonist model, which allowed accuracy predictions on a per-cluster basis. The accuracy of the best 4-assay subset ER agonist model ranged from 50% to 100% across these 44 clusters, with 32 clusters having accuracy ≥90%. Overall, the best 4-assay subset ER agonist model resulted in 122 false-positive and only 2 false-negative predictions compared with the full ER agonist model. Most false positives (89) were active in only two of the four assays, whereas all but 11 true positive chemicals were active in at least three assays. False positive chemicals also tended to have lower area under the curve (AUC) values, with 110 out of 122 false positives having an AUC value below 0.214, which is lower than 75% of the positives as predicted by the full ER agonist model. Many false positives demonstrated borderline activity. The median AUC value for the 122 false positives from the best 4-assay subset ER agonist model was 0.138, whereas the threshold for an active prediction is 0.1. Our results show that the existing 4-assay model performs well across a range of structurally diverse chemicals. Although this is a descriptive analysis of previous results, several concepts can be applied to any screening model used in the future. First, the clustering of the chemicals provides a means of ensuring that future screening evaluations consider the broad chemical space represented by the EDSP UoC. The clusters can also assist in prioritizing future chemicals for screening in specific clusters based on the activity of known chemicals in those clusters. The clustering approach can be useful in providing a framework to evaluate which portions of the EDSP UoC chemical space are reliably covered by in silico and in vitro approaches and where predictions from either method alone or both methods combined are most reliable. The lessons learned from this case study can be easily applied to future evaluations of model applicability and screening to evaluate future datasets.
Read full abstract