Abstract

In this work, random forest (RF), support vector machine, k-nearest neighbor and C4.5 decision tree, were used to establish classification models for predicting whether an unknown molecule is an inhibitor of human topoisomerase I (Top1) protein. All these models have achieved satisfactory results, with total prediction accuracies from 89.70% to 97.12%. Through comparative analysis, it can be found that the RF model has the best forecasting effect. The parameters were further optimized to generate the best-performing RF model. At the same time, features selection was implemented to choose properties most relevant to the inhibition of Top1 from 189 molecular descriptors through a special RF procedure. Subsequently, a ligand-based virtual screening was performed from the Maybridge database by the optimal RF model and 596 hits were picked out. Then, 67 molecules with relative probability scores over 0.7 were selected based on the screening results. Next, the 67 molecules above were docked to Top1 using AutoDock Vina. Finally, six top-ranked molecules with binding energies less than −10.0 kcal/mol were screened out and a common backbone, which is entirely different from that of existing Top1 inhibitors reported in the literature, was found.

Highlights

  • Supercoiling, knotting and catenation—three main types of topology—keep DNA firmly compacted into chromatin [1]

  • The results show that the random forest (RF) optimal model has filtered a diverse set of helpful structures from the database

  • To identify the active or inactive property of a compound targeting Top1, four machine learning (ML) classification models (RF, support vector machine (SVM), k-nearest neighbor (k-NN) and C4.5 decision tree (DT)) were developed in this study. Those models were compared based on several accuracy measures and the RF model outperformed others by internal OOB estimate

Read more

Summary

Introduction

Supercoiling, knotting and catenation—three main types of topology—keep DNA firmly compacted into chromatin [1]. Excessive supercoiling can seriously hinder replication and transcription that alters the DNA structure at inopportune times [2]. Transient unwinding and loosening of the parent supercoiled DNA are very crucial in order to maintain the integrity of the genetic material when a cell divides [3]. Topoisomerases (Tops) are essential and ubiquitous DNA processing enzymes that can deal with various topological issues through regulation of the super torsional strains generated during a series of vital cellular metabolic processes, including replication and transcription, and repair, recombination and segregation of DNA, in conjunction with chromatin assembly, and so on [4,5,6].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call