Analysis of Data Fusion Methods in Virtual Screening: Theoretical Model

Martin Whittle,Valerie J Gillet,Jens Loesel,Peter Willett

doi:10.1021/ci049615w

Abstract

This paper presents a theoretical model of how data fusion can be used to combine the results of multiple similarity searches of chemical databases. The model is based on frequency distributions of similarity values that are fused using a multiple integration over regions defined by the particular fusion rule that is being applied. For pairwise fusion, the resulting double integrals are straightforward to evaluate for simple model distributions. Similarity values for recovered-active and recovered-nonactive frequency distributions are independently modeled using a constant background, linearly biased terms, and a first-order correlated term. The model shows that two standard fusion rules can give performance enhancements in some cases but that the results of fusion are dependent on many factors that, taken together, can lead to seemingly inconsistent levels of enhancement.

Full Text