Real-World Molecular Out-Of-Distribution: Specification and Investigation.

Prudencio Tossou,Cas Wognum,Michael Craig,Hadrien Mary,Emmanuel Noutahi

doi:10.1021/acs.jcim.3c01774

Abstract

This study presents a rigorous framework for investigating molecular out-of-distribution (MOOD) generalization in drug discovery. The concept of MOOD is first clarified through a problem specification that demonstrates how the covariate shifts encountered during real-world deployment can be characterized by the distribution of sample distances to the training set. We find that these shifts can cause performance to drop by up to 60% and uncertainty calibration by up to 40%. This leads us to propose a splitting protocol that aims to close the gap between the deployment and testing. Then, using this protocol, a thorough investigation is conducted to assess the impact of model design, model selection, and data set characteristics on MOOD performance and uncertainty calibration. We find that appropriate representations and algorithms with built-in uncertainty estimation are crucial to improving performance and uncertainty calibration. This study sets itself apart by its exhaustiveness and opens an exciting avenue to benchmark meaningful algorithmic progress in molecular scoring.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of chemical information and modeling	Publication Date: Feb 1, 2024
Citations: 3	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Real-World Molecular Out-Of-Distribution: Specification and Investigation.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling

Lead the way for us

Similar Papers

Using simulation to evaluate prediction techniques [for software
M Shepperd ... G Kadoda
-
M Shepperd, et. al.M Shepperd ... G Kadoda
04 Apr 2001
04 Apr 2001

Comparing software prediction techniques using simulation
M Shepperd ... G Kadoda
IEEE Transactions on Software Engineering | VOL. 27
M Shepperd, et. al.M Shepperd ... G Kadoda
01 Jan 2001
IEEE Transactions on Software Engineering | VOL. 27

Uncertainty quantification: Can we trust artificial intelligence in drug discovery?
Jie Yu ... Mingyue Zheng
iScience | VOL. 25
Jie Yu, et. al.Jie Yu ... Mingyue Zheng
21 Jul 2022
iScience | VOL. 25

Assessment and Improvement of Statistical Tools for Comparative Proteomics Analysis of Sparse Data Sets with Few Experimental Replicates
Veit Schwämmle ... Ileana Rodríguez León
Journal of Proteome Research | VOL. 12
Veit Schwämmle, et. al.Veit Schwämmle ... Ileana Rodríguez León
05 Aug 2013
Journal of Proteome Research | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Real-World Molecular Out-Of-Distribution: Specification and Investigation.

Abstract

Talk to us

Similar Papers

More From: Journal of chemical information and modeling