BRAF kinase inhibitors have shown promise in treating melanoma patients with BRAF-V600 mutations. However, their clinical benefits are limited due to the development of acquired resistance. To address this issue, we conducted a systematic cheminformatic analysis and machine learning modeling to study the chemical space, scaffolds, structure–activity relationship, and landscape of human BRAF inhibitors. The final dataset comprised 3,952 molecules. Physicochemical property visualization for chemical space visualization has demonstrated that molecules from Group 1 (potent/active class) generally have slightly higher MW, RB, NumHAcceptors, and TPSA than molecules from Group 2 (intermediate/inactive class). The principal component analysis (PCA) shows that Group 1 data spreads widely and overlaps with the other group, which occupies only the left area on the plot. This suggests that molecules’ chemical structure or scaffolds are more diverse for Group 1 (Potent and Active) compounds than for Group 2 (Intermediate and Inactive) compounds. Murcko scaffold analysis has shown a greater scaffold diversity in the Active, Intermediate, and Inactive classes than in the Potent class. However, in the four bioactivity-defined classes for our BRAF dataset, molecules are uniformly distributed, with a small number of highly populated scaffolds and a large number of singletons. Furthermore, scaffold visualization has identified 12 representative Murcko scaffolds. Scaffolds 1 S1C1, S2C1, S2C2, S3C2, and S4C2 are highly favorable due to their high scaffold enrichment factor values. Based on scaffold analysis, the study investigated and summarized the local structure–activity relationships (SARs). In addition, the global SAR landscape was explored through quantitative structure–activity relationship (QSAR) modeling and structure–activity landscape visualization. Out of a total of 14 candidate models for BRAF inhibitors, a QSAR classification model that includes all 3952 molecules has been identified as the best model. The model was built using the PubChem fingerprint and extra trees algorithm and achieved an accuracy of 0.920 for the training set, 0.699 for the 10-fold cross-validation set, and 0.733 for the test set. Through a detailed analysis of the structure–activity landscapes, a total of sixteen significant consensus activity cliff (AC) generators were identified (ChEMBL molecule IDs: 4795335, 1822247, 3665863, 5083036, 5086749, 4061139, 1822242, 3665859, 1822250, 3641129, 3641043, 514688,3641178, 3697934, 3641046, and 368991). These generators offer valuable information on the SAR (Structure-Activity Relationship) for medicinal chemistry. The findings from this study provide new insights and guidelines for hit identification and lead optimization in developing novel BRAF inhibitors.
Read full abstract