Utilizing random Forest QSAR models with optimized parameters for target identification and its application to target-fishing server

Kyoungyeul Lee,Minho Lee,Dongsup Kim

doi:10.1186/s12859-017-1960-x

Abstract

BackgroundThe identification of target molecules is important for understanding the mechanism of “target deconvolution” in phenotypic screening and “polypharmacology” of drugs. Because conventional methods of identifying targets require time and cost, in-silico target identification has been considered an alternative solution. One of the well-known in-silico methods of identifying targets involves structure activity relationships (SARs). SARs have advantages such as low computational cost and high feasibility; however, the data dependency in the SAR approach causes imbalance of active data and ambiguity of inactive data throughout targets.ResultsWe developed a ligand-based virtual screening model comprising 1121 target SAR models built using a random forest algorithm. The performance of each target model was tested by employing the ROC curve and the mean score using an internal five-fold cross validation. Moreover, recall rates for top-k targets were calculated to assess the performance of target ranking. A benchmark model using an optimized sampling method and parameters was examined via external validation set. The result shows recall rates of 67.6% and 73.9% for top-11 (1% of the total targets) and top-33, respectively. We provide a website for users to search the top-k targets for query ligands available publicly at http://rfqsar.kaist.ac.kr.ConclusionsThe target models that we built can be used for both predicting the activity of ligands toward each target and ranking candidate targets for a query ligand using a unified scoring scheme. The scores are additionally fitted to the probability so that users can estimate how likely a ligand–target interaction is active. The user interface of our web site is user friendly and intuitive, offering useful information and cross references.

Highlights

The identification of target molecules is important for understanding the mechanism of “target deconvolution” in phenotypic screening and “polypharmacology” of drugs
Thereafter, the receiver-operating characteristic (ROC) curve and its area under curve (AUC) value, and the recall for the top-k targets (k = 11 and 33, which corresponds to 1% and 3% of total targets, respectively) were evaluated and compared with the results obtained in other studies
The virtual screening results of the five-fold cross validation were first used to measure the performance for each target model

Summary

Introduction

The identification of target molecules is important for understanding the mechanism of “target deconvolution” in phenotypic screening and “polypharmacology” of drugs. Because conventional methods of identifying targets require time and cost, in-silico target identification has been considered an alternative solution. One of the well-known in-silico methods of identifying targets involves structure activity relationships (SARs). SARs have advantages such as low computational cost and high feasibility; the data dependency in the SAR approach causes imbalance of active data and ambiguity of inactive data throughout targets. A “target deconvolution,” wherein the actual targets of the molecules are disclosed, is crucial in understanding the mechanism of action, which remains challenging [6]. Even if the target of a drug is already known, it is still necessary to predict the association with other targets. Discovering polypharmacology of drugs can be useful for drug repositioning to determine novel ways to facilitate drugs and for predicting side effects to avoid harmful responses beforehand [8,9,10]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2017
Citations: 53	License type: open-access

R Discovery Prime

R Discovery Prime

Utilizing random Forest QSAR models with optimized parameters for target identification and its application to target-fishing server

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Application of SAR methods to non-congeneric data bases assocated with carcinogenicity and mutagenicity: Issues and approachs
Ann M Richard
Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis | VOL. 305
Ann M RichardAnn M Richard
01 Feb 1994
Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis | VOL. 305

AI and SAR approaches for predicting chemical carcinogenicity: Survey and status report
A.M Richard ... R Benigni
SAR and QSAR in Environmental Research | VOL. 13
A.M Richard, et. al.A.M Richard ... R Benigni
01 Jan 2002
SAR and QSAR in Environmental Research | VOL. 13

External validation and comparison of MR-based radiomics models for predicting pathological complete response in locally advanced rectal cancer: a two-centre, multi-vendor study.
Qiurong Wei ... Liming Zhong
European Radiology | VOL. 33
Qiurong Wei, et. al.Qiurong Wei ... Liming Zhong
10 Nov 2022
European Radiology | VOL. 33

Cooperative Profit Random Forests With Application in Ocean Front Recognition
Jianyuan Sun ... Qin Zhang
IEEE Access | VOL. 5
Jianyuan Sun, et. al.Jianyuan Sun ... Qin Zhang
01 Jan 2017
IEEE Access | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Utilizing random Forest QSAR models with optimized parameters for target identification and its application to target-fishing server

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics