Computational target identification plays a pivotal role in the drug development process. With the significant advancements of deep learning methods for protein structure prediction, the structural coverage of human proteome has increased substantially. This progress inspired the development of the first genome-wide small molecule targets scanning method. Our method aims to localize drug targets and detect potential off-target effects early in the drug discovery process, thereby improving the success rate of drug development. We have constructed a high-quality database of protein structures with annotated potential binding sites, covering 82% of the protein-coding genome. On the basis of this database, to enhance our search capabilities, we have integrated computational techniques, including both artificial intelligence-based and biophysical model-based methods. This integration led to the development of a target identification method called Multi-Algorithm Integrated Target Fisher (MAI-TargetFisher). MAI-TargetFisher leverages the complementary strengths of various methods while minimizing their weaknesses, enabling precise database navigation to generate a reliably ranked set of candidate targets for an active query molecule. Importantly, our work is the first comprehensive scan of protein surfaces across the entire human genome, aimed at evaluating potential small molecule binding sites on each protein. Through a series of evaluations on benchmark and a target identification task, the results demonstrate the high hit rates and good reliability of our method under the validation of wet experiments. We have also made available a freely accessible web server at https://bailab.siais.shanghaitech.edu.cn/mai-targetfisher for non-commercial use.
Read full abstract