Human Immunodeficiency Virus (HIV) serves as a crisis of global public health, necessitating new anti-HIV agents due to the virus's rapid mutation and subsequent drug resistance. While current Combination Antiretroviral Therapy (CART) has helped control the infection and mortality rates, traditional drug development approaches are costly and inefficient. This study aims to address this issue by applying machine learning algorithms for lead compound discovery using the extensive, quality-assured DTP Antiviral Screen Databases. Three molecular datasets—Extended-Connectivity Fingerprints (ECFP), Simplified Molecular-Input Line-Entry System (SMILES), and 2D molecular IMAGES—were processed using Principal Component Analysis (PCA), train-test splitting, and dataset balancing. Six machine learning algorithms were employed, including linear and nonlinear models, optimized through 5-fold cross-validation. The Area Under the Receiver Operating Characteristic (AUROC) curve was utilized to evaluate the models' performance, as well as the macro averaged precision, averaged recall, averaged F1 score, and balanced accuracy metrics. The ensemble models were constructed from the top-performing individual models. The best individual model, a SVM model trained on the ECFP dataset, achieved performance metrics of 0.78 on macro-averaged precision; 0.68 on macro-averaged recall; 0.72 on macro-averaged F1 score; 0.71 on balanced accuracy; and 0.75 on AUROC when evaluated on the testing data. The best ensemble model, fused with SVM, kNN, and logistic regression trained on the ECFP dataset, achieved performance metrics of 0.82 on macro-averaged precision; 0.67 on macro-averaged recall; 0.72 on macro-averaged F1 score; and 0.70 on balanced accuracy when evaluated on the testing data. The models were then applied to a Pubmed-extracted drug dataset, identifying several promising anti-HIV drug candidates, fulfilling the study's objective to improve the efficiency and success rate of new anti-HIV drug screening and discovery. In summary, this research demonstrates the transformative potential of machine learning in accelerating and optimizing the drug discovery process for HIV treatment.
Read full abstract