Benchmark data sets for structure-based computational target prediction.

Karen T Schomburg,Matthias Rarey

doi:10.1021/ci500131x

Abstract

Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for evaluation exist, rendering comparison and demonstration of improvements of methods cumbersome. Therefore, we propose two data sets and evaluation strategies for a meaningful evaluation of new target prediction methods, i.e., a small data set consisting of three target classes for detailed proof-of-concept and selectivity studies and a large data set consisting of 7992 protein structures and 72 drug-like ligands allowing statistical evaluation with performance metrics on a drug-like chemical space. Both data sets are built from openly available resources, and any information needed to perform the described experiments is reported. We describe the composition of the data sets, the setup of screening experiments, and the evaluation strategy. Performance metrics capable to measure the early recognition of enrichments like AUC, BEDROC, and NSLR are proposed. We apply a sequence-based target prediction method to the large data set to analyze its content of nontrivial evaluation cases. The proposed data sets are used for method evaluation of our new inverse screening method iRAISE. The small data set reveals the method's capability and limitations to selectively distinguish between rather similar protein structures. The large data set simulates real target identification scenarios. iRAISE achieves in 55% excellent or good enrichment a median AUC of 0.67 and RMSDs below 2.0 Å for 74% and was able to predict the first true target in 59 out of 72 cases in the top 2% of the protein data set of about 8000 structures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Benchmark data sets for structure-based computational target prediction.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling

Lead the way for us

Journal: Journal of Chemical Information and Modeling	Publication Date: Aug 1, 2014
Citations: 18

Similar Papers

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Transfer learning-based fault location with small datasets in VSC-HVDC
Boyang Shang ... Jiaxin Hei
International Journal of Electrical Power & Energy Systems | VOL. 151
Boyang Shang, et. al.Boyang Shang ... Jiaxin Hei
13 Apr 2023
International Journal of Electrical Power & Energy Systems | VOL. 151

A clustering method for very large mixed data sets
G Sanchez-Diaz ... J Ruiz-Shulcloper
-
G Sanchez-Diaz, et. al.G Sanchez-Diaz ... J Ruiz-Shulcloper
29 Nov 2001
29 Nov 2001

Anonylitics: From a Small Data to a Big Data Anonymization System for Analytical Projects
Alejandro Sierra-Múnera ... Victor Moncayo
-
Alejandro Sierra-Múnera, et. al.Alejandro Sierra-Múnera ... Victor Moncayo
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Benchmark data sets for structure-based computational target prediction.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling