Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein-ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a given protein receptor binding pocket reasonably accurately. With the publicly available protein-ligand binding affinity data sets in both sequential and structural forms, machine learning methods have gained traction as a top choice for developing such scoring functions. While the performance shown by these models is optimistic, there are several hidden biases present in these data sets themselves that affect the utility of such models for practical purposes such as virtual screening. In this work, we use published methods to systematically investigate several such factors or biases present in these data sets. In our analysis, we highlight the importance of considering sequence, protein-ligand interaction, and pocket structure similarity while constructing data splits and provide an explanation for good protein-only and ligand-only performances in some data sets. Through this study, we provide to the community several pointers for the design of binding affinity predictors and data sets for reliable applicability.
Read full abstract