Abstract

High-throughput screening of compounds (chemicals) is an essential part of drug discovery, involving thousands to millions of compounds, with the purpose of identifying candidate hits. Most statistical tools, including the industry standard B-score method, work on individual compound plates and do not exploit cross-plate correlation or statistical strength among plates. We present a new statistical framework for high-throughput screening of compounds based on Bayesian nonparametric modeling. The proposed approach is able to identify candidate hits from multiple plates simultaneously, sharing statistical strength among plates and providing more robust estimates of compound activity. It can flexibly accommodate arbitrary distributions of compound activities and is applicable to any plate geometry. The algorithm provides a principled statistical approach for hit identification and false discovery rate control. Experiments demonstrate significant improvements in hit identification sensitivity and specificity over the B-score and R-score methods, which are highly sensitive to threshold choice. These improvements are maintained at low hit rates. The framework is implemented as an efficient R extension package BHTSpack and is suitable for large scale data sets.

Highlights

  • Two types of error can occur in the primary screening process, namely false positive (FP) and false negative (FN) errors

  • Dirichlet process Gaussian mixtures (DPGM)[12,13] constitute a powerful class of nonparametric models that can describe a wide range of distributions encountered in practice

  • An example of multi-task learning is the simultaneous segmentation of multiple images for the purpose of image analysis, which can be facilitated by the use of the hierarchical Dirichlet process (HDP) or a variation of it[16,17]

Read more

Summary

Introduction

Two types of error can occur in the primary screening process, namely false positive (FP) and false negative (FN) errors. Because the individual 96-well plates are processed as a whole, artifacts from robotic equipment, unintended difference in concentration, agent evaporation, or other errors[2] might propagate through the plates This type of cross-plate correlation is not accounted for by simple HTS systems working on individual 96-well plates. High-throughput screening statistical practice[1,3] has traditionally used simple methods such as the B-score[4], R-score[5], Z-score and the normalized percent inhibition (NPI), for measuring compound activity and identifying potential candidate hits. These methods transform the compound raw value into the so called normalized value, which can be used directly to assess compound activity. The B-score, R-score and Z-score do not use controls in the normalization process, while the NPI makes use of both positive and negative controls

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.