Abstract Introduction: With the accumulation of large-scale pharmacogenomic data such as whole-genome RNA sequencing, copy number and mutation profiles for tens of thousands of samples, further screened with thousands of small molecules and other perturbagens, the question arises how to best leverage partially overlapping datasets generated at different facilities. Notably, Haibe-Kains et al. observed discordance in state-of-the-art pharmacogenomic repositories. As research groups across the world continue to generate drug screens of variable size and quality, the need for approaches that can learn from such partially overlapping experiments and improve the signal to noise ratio emerges with increasing importance. Methods: A previously published Bayesian group factor analysis model was shown to outperform other approaches in predicting drug response and identifying gene signature from pharmacogenomic datasets particularly by leveraging shared information for a given gene across multiple omics assays. Here, we applied the same model in a similar fashion to now learn from shared observations for the same drug across multiple partially overlapping and noisy small molecule screens. We integrate gene expression, mutation and drug response data from the two largest pan-cancer repositories at the Broad and Sanger Institutes, respectively. We train joint models on partially overlapping data from both. We evaluate our performance in three ways: 1) we test if the joint model improves the prioritization of known consensus biomarkers for the drugs shared between the two cohorts; 2) we test if the joint model improves the recapitulation of shared drug mechanism of action as compared to the single dataset models; 3) we evaluate the joint models with respect to pathway enrichment as compared to the single dataset models. Results: We evaluated the performance of our joint model for 5 drugs shared between the 2 resources: selumetinib, tanespimycin, nutlin 3A, mirdametinib and PLX4720. First, we show that training joint models on partially overlapping pharmacogenomic datasets can overall improve gene signature identification by improving the ranks of known consensus biomarkers. Second, we show that the joint model learns a latent representation of the drugs that better recapitulates the underlying known mechanisms of action for the three serine/threonine kinase inhibitors. Finally, we show that the joint model achieved improved pathway enrichment results for the targeted MAPK/ERK signaling pathway. Conclusions: We present an application of a Bayesian group factor analysis model, where we employ a drug-centric prior to transfer information about drugs screened in multiple datasets. We show that joint models leveraging partially overlapping large-scale pharmacogenomic datasets from the Broad and Sanger institutes can overall improve drug signature identification. Citation Format: Dharani Thirumalaisamy, Sunil K. Joshi, Mehmet Gönen, Olga Nikolova. Drug-centric prior improves drug response signature identification in partially overlapping, large-scale pharmacogenomic datasets [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7365.