Background: The aryl hydrocarbon receptor (AhR) plays a crucial role in immune and metabolic processes. The large molecular diversity of ligands capable of activating AhR makes it impossible to determine the structural features useful for the design of new potent modulators. Thus, in the field of drug discovery, the intricate nature of AhR activation necessitates the development of novel tools to address related challenges. Methods: In this study, quantitative structure–activity relationship (QSAR) models of classification and regression were developed with the objective of identifying the most effective method for predicting AhR activity. The initial dataset was obtained by combining the ChEMBL and WIPO databases which contained 978 molecules with EC50 values. The predictive models were developed using the automated machine learning platform mljar according to a 10-fold cross validation (10-CV) testing procedure. Results: The classification model demonstrated an accuracy value of 0.760 and F1 value of 0.789 for the test set. The root-mean-squared error (RMSE) was 5444, and the coefficient of determination (R2) was 0.208 for the regression model. The Shapley Additive Explanations (SHAP) method was then employed for a deeper comprehension of the impact of the variables on the model’s predictions. As a practical application for scientific purposes, the best performing classification model was then used to develop an AhR web application. This application is accessible online and has been implemented in Streamlit. Conclusions: The findings may serve as a foundation in prompting further research into the development of a QSAR model, which could enhance comprehension of the influence of ligand structure on the modulation of AhR activity.
Read full abstract