Abstract

BackgroundHigh throughput methodologies such as microarrays, mass spectrometry and plate-based small molecule screens are increasingly used to facilitate discoveries from gene function to drug candidate identification. These large-scale experiments are typically carried out over the course of months and years, often without the controls needed to compare directly across the dataset. Few methods are available to facilitate comparisons of high throughput metabolic data generated in batches where explicit in-group controls for normalization are lacking.ResultsHere we describe MIPHENO (Mutant Identification by Probabilistic High throughput-Enabled Normalization), an approach for post-hoc normalization of quantitative first-pass screening data in the absence of explicit in-group controls. This approach includes a quality control step and facilitates cross-experiment comparisons that decrease the false non-discovery rates, while maintaining the high accuracy needed to limit false positives in first-pass screening. Results from simulation show an improvement in both accuracy and false non-discovery rate over a range of population parameters (p < 2.2 × 10-16) and a modest but significant (p < 2.2 × 10-16) improvement in area under the receiver operator characteristic curve of 0.955 for MIPHENO vs 0.923 for a group-based statistic (z-score). Analysis of the high throughput phenotypic data from the Arabidopsis Chloroplast 2010 Project (http://www.plastid.msu.edu/) showed ~ 4-fold increase in the ability to detect previously described or expected phenotypes over the group based statistic.ConclusionsResults demonstrate MIPHENO offers substantial benefit in improving the ability to detect putative mutant phenotypes from post-hoc analysis of large data sets. Additionally, it facilitates data interpretation and permits cross-dataset comparison where group-based controls are missing. MIPHENO is applicable to a wide range of high throughput screenings and the code is freely available as Additional file 1 as well as through an R package in CRAN.

Highlights

  • High throughput methodologies such as microarrays, mass spectrometry and plate-based small molecule screens are increasingly used to facilitate discoveries from gene function to drug candidate identification

  • The approach is tolerant to repetition of both individual samples and sample groups across the course of the experiment so long as the portion of individuals showing a wild type (WT) response in any sample group is over 50%

  • As the portion of WT individuals in a sample group decreases, there will be a reduction in accuracy and a corresponding increase in false non-discovery rate (FNDR) due to the assumptions of the algorithm, as demonstrated in the Testing section below

Read more

Summary

Introduction

High throughput methodologies such as microarrays, mass spectrometry and plate-based small molecule screens are increasingly used to facilitate discoveries from gene function to drug candidate identification These large-scale experiments are typically carried out over the course of months and years, often without the controls needed to compare directly across the dataset. High-throughput screening studies in biology and other fields are increasingly popular due to ease of sample tracking and decreasing technology costs These experimental setups enable researchers to obtain numerous measurements across multiple individuals in parallel (e.g. gene expression and diverse plate-based assays) or in series (e.g. metabolomics and proteomics platforms). Properties of the sample cohort serve as controls with the measure of differences between an individual and its cohort used to identify samples differentially accumulating a metabolite [6] This strategy can streamline sample processing and maximize throughput when the expected effects are large and observable

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call