Abstract

This paper presents a theoretical model of how data fusion can be used to combine the results of multiple similarity searches of chemical databases. The model is based on frequency distributions of similarity values that are fused using a multiple integration over regions defined by the particular fusion rule that is being applied. For pairwise fusion, the resulting double integrals are straightforward to evaluate for simple model distributions. Similarity values for recovered-active and recovered-nonactive frequency distributions are independently modeled using a constant background, linearly biased terms, and a first-order correlated term. The model shows that two standard fusion rules can give performance enhancements in some cases but that the results of fusion are dependent on many factors that, taken together, can lead to seemingly inconsistent levels of enhancement.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call