G protein-coupled receptors (GPCRs) are one of the most important drug targets, accounting for ∼34% of drugs on the market. For drug discovery, accurate modeling and explanation of bioactivities of ligands is critical for the screening and optimization of hit compounds. Homologous GPCRs are more likely to interact with chemically similar ligands, and they tend to share common binding modes with ligand molecules. The inclusion of homologous GPCRs in learning bioactivities of ligands potentially enhances the accuracy and interpretability of models due to utilizing increased training sample size and the existence of common ligand substructures that control bioactivities. Accurate modeling and interpretation of bioactivities of ligands by combining homologous GPCRs can be formulated as multitask learning with joint feature learning problem and naturally matched with the group lasso learning algorithm. Thus, we proposed a multitask regression learning with group lasso (MTR-GL) implemented by l2,1-norm regularization to model bioactivities of ligand molecules and then tested the algorithm on a series of thirty-five representative GPCRs datasets that cover nine subfamilies of human GPCRs. The results show that MTR-GL is overall superior to single-task learning methods and classic multitask learning with joint feature learning methods. Moreover, MTR-GL achieves better performance than state-of-the-art deep multitask learning based methods of predicting ligand bioactivities on most datasets (31/35), where MTR-GL obtained an average improvement of 38% on correlation coefficient (r2) and 29% on root-mean-square error over the DeepNeuralNet-QSAR predictors.
Read full abstract