Abstract

The EGFR is a clinically important therapeutic drug target in lung cancer. The first-generation tyrosine kinase inhibitors used in clinics are effective against L858R-mutated EGFR. However, relapse of the disease due to the presence of resistant mutation (T790M) makes these inhibitors ineffective. This has necessitated the need to identify new potent EGFR inhibitors against the resistant double mutants. Therefore, various machine learning techniques ((instance-based learner (IBK), naïve Bayesian (NB), sequential minimal optimization (SMO), and random forest (RF)) were employed to develop twelve classification models on three different datasets (high, moderate, and weakly active inhibitors). The models were validated using fivefold cross-validation and independent validation datasets. It was observed that the random forest-based models showed best performance. Also, functional groups, PubChem fingerprints, and substructure of highly active inhibitors were compared to inactive to identify structural features which are important for activity. To promote open-source drug discovery, a tool has been developed, which incorporates the best performing models and allows users to predict the potential of chemical molecules as anti-TMLR inhibitor. It is expected that the machine learning classification models developed in this study will pave way for identifying novel inhibitors against the resistant EGFR double mutants.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call