Abstract

BackgroundIt is known that any individual similarity measure will not always give the best recall of active molecule structure for all types of activity classes. Recently, the effectiveness of ligand-based virtual screening approaches can be enhanced by using data fusion. Data fusion can be implemented using two different approaches: group fusion and similarity fusion. Similarity fusion involves searching using multiple similarity measures. The similarity scores, or ranking, for each similarity measure are combined to obtain the final ranking of the compounds in the database.ResultsThe Condorcet fusion method was examined. This approach combines the outputs of similarity searches from eleven association and distance similarity coefficients, and then the winner measure for each class of molecules, based on Condorcet fusion, was chosen to be the best method of searching. The recall of retrieved active molecules at top 5% and significant test are used to evaluate our proposed method. The MDL drug data report (MDDR), maximum unbiased validation (MUV) and Directory of Useful Decoys (DUD) data sets were used for experiments and were represented by 2D fingerprints.ConclusionsSimulated virtual screening experiments with the standard two data sets show that the use of Condorcet fusion provides a very simple way of improving the ligand-based virtual screening, especially when the active molecules being sought have a lowest degree of structural heterogeneity. However, the effectiveness of the Condorcet fusion was increased slightly when structural sets of high diversity activities were being sought.

Highlights

  • It is known that any individual similarity measure will not always give the best recall of active molecule structure for all types of activity classes

  • Data fusion has been used to combine the results of the structure and ligandbased approaches to virtual screening [15], their results outperforming any single method in ranking of activities

  • The first screening system was based on the Tanimoto (TAN) coefficient, which has been used in ligand-based virtual screening for many years and is considered a reference standard

Read more

Summary

Introduction

It is known that any individual similarity measure will not always give the best recall of active molecule structure for all types of activity classes. The effectiveness of ligand-based virtual screening approaches can be enhanced by using data fusion. The similarity scores, or ranking, for each similarity measure are combined to obtain the final ranking of the compounds in the database. Many virtual screening (VS) approaches have been implemented for searching chemical databases, such as substructure search, similarity, docking and QSAR. A more realistic approach to enhancing the effectiveness of ligand-based virtual screening approaches is the use of data fusion [10] or consensus scoring in the structure-based virtual screening literature [11]. Data fusion has been used to combine the results of the structure and ligandbased approaches to virtual screening [15], their results outperforming any single method in ranking of activities. The latest reviews on using fusion in ligand-based virtual screening can be found in [16,17]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call