Abstract
We introduce a novel technique to determine the expression state of a gene from quantitative information measuring its expression. Adopting a productive abstraction from current thinking in molecular biology, we consider two expression states for a gene - Up or Down. We determine this state by using a statistical model that assumes the data behaves as a combination of two biological distributions. Given a cohort of hybridizations, our algorithm predicts, for the single reading, the probability of each gene's being in an Up or a Down state in each hybridization. Using a series of publicly available gene expression data sets, we demonstrate that our algorithm outperforms the prevalent algorithm. We also show that our algorithm can be used in conjunction with expression adjustment techniques to produce a more biologically sound gene-state call. The technique we present here enables a routine update, where the continuously evolving expression level adjustments feed into gene-state calculations. The technique can be applied in almost any multi-sample gene expression experiment, and holds equal promise for protein abundance experiments.
Highlights
In examining genes, either individually or in system-wide characterizations, it is useful to generalize its ‘‘state’’
Of the 22,283 probes examined in the experiment, the MAS5 algorithm was consistent in assigning the same Present/Absent call, across all samples, for each of 17,004 probes; the remaining 5278 probes were assigned inconsistent calls by MAS5
The gamma mixture (GM) algorithm showed an improvement of 55% in consistency
Summary
Either individually or in system-wide characterizations, it is useful to generalize its ‘‘state’’. A gene’s Present/Absent call is a common dimension of the reported results of gene-expression microarray experiments. Such calls tag each probe set in the microarray with a determination of whether the probe set is expressed (Present) or unexpressed (Absent) in the sampled tissue [1]., Present/Absent calls are often used in filtering out false positives from the large collection of probes on an expression array. The most commonly used approach to making such calls is the MAS5 algorithm [1], part of the AffymetrixTM collection of software tools [2]. While some recent experimental findings support the use of the MAS5 algorithm [3], MAS5 has some significant shortcomings. Because MAS5 does not operate on adjusted readings, it cannot benefit from the increasingly sophisticated techniques for adjusting gene expression readings (e.g. RMA [4] and others [5]; see [6] for a comparison of techniques)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.