Abstract

BackgroundThe identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear Model) method implements a linear model applied on the probe-level data to directly estimate the treatment effect. A finite mixture of Gaussian components is then used to identify DEGs using the coefficients estimated by the linear model. This approach can readily be applied to experimental design with or without replication.ResultsOn a wholly defined dataset, the PL-LM method was able to identify 75% of the differentially expressed genes within 10% of false positives. This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition.ConclusionThe method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition.

Highlights

  • The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, deriving significance by comparing these expression levels between conditions

  • DNA microarrays are commonly used to measure, in parallel, the steady-state concentration of tens of thousands of mRNAs, providing an estimate of the level of expression of the corresponding genes. They come in two flavors: 1) spotted arrays allows for the simultaneous measurement of two samples on the same array, we'll refer to these arrays as multi-channel arrays; 2) Affymetrix GeneChips arrays with a significantly higher density but only allowing for the hybridization of one sample, we'll refer to those as High-Density Arrays, HDAs

  • By adapting and combining previously proposed approaches, the PL-LM method was able to outperform by a significant margin the preferred method identified by Choe et al [7] on their validation dataset: MAS5.0 background correction and PM adjustment, median polish expression summaries followed by loess normalization and Cyber-T [8]

Read more

Summary

Introduction

The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, deriving significance by comparing these expression levels between conditions. A finite mixture of Gaussian components is used to identify DEGs using the coefficients estimated by the linear model This approach can readily be applied to experimental design with or without replication. DNA microarrays are commonly used to measure, in parallel, the steady-state concentration of tens of thousands of mRNAs, providing an estimate of the level of expression of the corresponding genes They come in two flavors: 1) spotted arrays allows for the simultaneous measurement of two samples on the same array, we'll refer to these arrays as multi-channel arrays; 2) Affymetrix GeneChips arrays with a significantly higher density but only allowing for the hybridization of one sample, we'll refer to those as High-Density Arrays, HDAs. A typical piece of information that investigators seek to extract from microarrays is the list of differentially expressed genes (DEGs) between a treatment and a control condition. Methods applied to log-ratios on multichannel arrays can readily be applied to these computed ratios

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call