Efficient Modeling and Active Learning Discovery of Biological Responses

Armaghan W Naik,Christopher J Langmead,Joshua D Kangas,Robert F Murphy

doi:10.1371/journal.pone.0083996

Armaghan W Naik, Christopher J Langmead + Show 2 more

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0083996

Copy DOI

Export

Save

Cite

Journal: PLoS ONE	Publication Date: Dec 17, 2013
Citations: 29	License type: CC BY 4.0

Affiliation: Carnegie Mellon University, University of Freiburg

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

High throughput and high content screening involve determination of the effect of many compounds on a given target. As currently practiced, screening for each new target typically makes little use of information from screens of prior targets. Further, choices of compounds to advance to drug development are made without significant screening against off-target effects. The overall drug development process could be made more effective, as well as less expensive and time consuming, if potential effects of all compounds on all possible targets could be considered, yet the cost of such full experimentation would be prohibitive. In this paper, we describe a potential solution: probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for efficiently selecting which experiments to perform in order to build those models and determining when to stop. Using simulated and experimental data, we show that our approaches can produce powerful predictive models without exhaustive experimentation and can learn them much faster than by selecting experiments at random.

Highlights

It is increasingly accepted that the study of biology requires a paradigm shift from a reductionist framework to a complex systems approach [1,2,3]
For complex systems the upper bound on the total number of experiments is the number of ways in which the components can be taken in combinations up to some maximum number per experiment
We show in extensive computational experiments that a combination of a structure learning method and active learning can achieve high accuracy of prediction of condition-specific effects on targets with significantly fewer experiments than a random learner, in many cases with perfect accuracy without exhaustive experimentation

Summary

Introduction

It is increasingly accepted that the study of biology requires a paradigm shift from a reductionist framework to a complex systems approach [1,2,3]. Reductionist frameworks implicitly assume that the object of study is comprised of a finite set of subsystems, each functionally and essentially physically distinct In this case there is a reasonable upper bound for the total number of experiments necessary to characterize the whole, one experiment per component per subsystem. We must assume some structure or correlations exist within the complete data, and that predictive models can be used to capture them and guide future experimentation Algorithms for this type of problem are termed Active Learning in the machine learning literature [7,8,9,10]. We show in extensive computational experiments that a combination of a structure learning method and active learning can achieve high accuracy of prediction of condition-specific effects on targets with significantly fewer experiments than a random learner, in many cases with perfect accuracy without exhaustive experimentation.

Methods

Results

Conclusion