Contemporary drug discovery paradigms rely heavily on binding assays about the bio-physicochemical processes. However, this dominant approach suffers from overlooked higher-order interactions arising from the intricacies of molecular mechanisms, such as those involving cis -regulatory elements. It introduces potential impairments and restrains the potential development of computational methods. To address this limitation, I developed a deep learning model that leverages an end-to-end approach, relying exclusively on therapeutic information about drugs. By transforming textual representations of drug and virus genetic information into high-dimensional latent representations, this method evades the challenges arising from insufficient information about binding specificities. Its strengths lie in its ability to implicitly consider complexities such as epistasis and chemical–genetic interactions, and to handle the pervasive challenge of data scarcity. Through various modeling skills and data augmentation techniques, the proposed model demonstrates outstanding performance in out-of-sample validations, even in scenarios with unknown complex interactions. Furthermore, the study highlights the importance of chemical diversity for model training. While the method showcases the feasibility of deep learning in data-scarce scenarios, it reveals a promising alternative for drug discovery in situations where knowledge of underlying mechanisms is limited.
Read full abstract