Abstract

Machine learning is widely applied in drug discovery research to predict molecular properties and aid in the identification of active compounds. Herein, we introduce a new approach that uses model-internal information from compound activity predictions to uncover relationships between target proteins. On the basis of a large-scale analysis generating and comparing machine learning models for more than 200 proteins, feature importance correlation analysis is shown to detect similar compound binding characteristics. Furthermore, rather unexpectedly, the analysis also reveals functional relationships between proteins that are independent of active compounds and binding characteristics. Feature importance correlation analysis does not depend on specific representations, algorithms, or metrics and is generally applicable as long as predictive models can be derived. Moreover, the approach does not require or involve explainable or interpretable machine learning, but only access to feature weights or importance values. On the basis of our findings, the approach represents a new facet of machine learning in drug discovery with potential for practical applications.

Highlights

  • Machine learning is widely applied in drug discovery research to predict molecular properties and aid in the identification of active compounds

  • The concept of feature importance correlation introduced in this study aims to identify relationships between proteins on the basis of machine learning (ML) model-internal information without the need to explain individual predictions

  • The underlying idea was that correlation between important features learned by independent models for different target proteins should be an indicator of relationships between these proteins

Read more

Summary

Introduction

Machine learning is widely applied in drug discovery research to predict molecular properties and aid in the identification of active compounds. In medicinal chemistry and drug design, machine learning (ML) has long been applied to predict molecular properties of compounds, especially biological ­activity[1,2]. We have reasoned that feature importance distributions might be determined as a model-agnostic and model-internal computational signature of data set properties, without any requirements to interpret predictions To these ends, we have further extended the feature weighting approach and introduce feature importance correlation analysis to reveal similar data set signatures. In our proof-of-concept study, the methodology was applied to compound activity prediction models where high feature importance correlation served as an indicator of similar compound binding characteristics of proteins as well as functional relationships. The results of our proof-of-concept investigation are presented

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call