Abstract

BackgroundDevelopment of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. The widely used effect size models are thought to provide an efficient modeling framework for this purpose, where the measures of association for each study and each gene are combined, weighted by the standard errors. A significant disadvantage of this strategy is that the quality of different data sets may be highly variable, but this information is usually neglected during the integration. Moreover, it is widely known that the estimated standard deviations are probably unstable in the commonly used effect size measures (such as standardized mean difference) when sample sizes in each group are small.ResultsWe propose a re-parameterization of the traditional mean difference based effect measure by using the log ratio of means as an effect size measure for each gene in each study. The estimated effect sizes for all studies were then combined under two modeling frameworks: the quality-unweighted random effects models and the quality-weighted random effects models. We defined the quality measure as a function of the detection p-value, which indicates whether a transcript is reliably detected or not on the Affymetrix gene chip. The new effect size measure is evaluated and compared under the quality-weighted and quality-unweighted data integration frameworks using simulated data sets, and also in several data sets of prostate cancer patients and controls. We focus on identifying differentially expressed biomarkers for prediction of cancer outcomes.ConclusionOur results show that the proposed effect size measure (log ratio of means) has better power to identify differentially expressed genes, and that the detected genes have better performance in predicting cancer outcomes than the commonly used effect size measure, the standardized mean difference (SMD), under both quality-weighted and quality-unweighted data integration frameworks. The new effect size measure and the quality-weighted microarray data integration framework provide efficient ways to combine microarray results.

Highlights

  • Development of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis

  • Analysis of simulated data sets We evaluated the performance of our method using simulated Affymetrix probe level expression data generated from a model incorporating probe level effects, optical noise, and non-specific binding, as well as true signals [31,36]

  • Following the simulation procedures described in Methods section, we run three simulation models for probe-level gene expression profiles generated from two independent studies

Read more

Summary

Introduction

Development of efficient analytic methodologies for combining microarray results is a major challenge in gene expression analysis. Each data set is first preprocessed to clean and align the signals, and these preprocessed datasets are put together so that the integrated data set can be treated as though it comes from a single study In this way, the effective sample size is greatly increased. Wang et al [17] standardized gene expression levels based on the means and standard deviations of expression measurements from the arrays of healthy prostate samples. These methods are simple and in many cases, if the transformation is carefully made, the performance of disease outcome prediction can be improved [14]. There are no consensus or clear guidelines on the best way to perform the necessary data transformations

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call