Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

Jaroslav Novák ,Eduardo Blumwald,Juan Hidalgo,Olga Modlich,Jun Xu,Charles Wang,Milena Penkowa,Marjan Boerma,Zoran Gatalica,David Twell,David Honys,Jordan B Sottosanto ,Manuel G Cosio ,Joan L Slonczewski ,Douglas A Bell ,Michael S Rolph ,Mia Miller ,Michael J Szego ,Fred R Blattner ,David J Volsky ,René St‐Arnaud ,Seon‐Young Kim ,Roderick R Mcinnes ,Marián Hajdúch

doi:10.1186/1745-6150-1-27

Abstract

BackgroundDNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data.ResultsHere we examine the expression data obtained from 682 Affymetrix GeneChips® with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution.ConclusionIn this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kα coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kα distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.ReviewersThis article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser).

Highlights

DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations
Our main objective was to study the microarray data derived from particular biological investigations, generated in many different microarray core laboratories, rather than the sets of arrays produced in the context of technology development or testing methods of analysis
Probability intervals and correlation of the Kα coefficients with t-distribution Once we evaluate the standard deviation function, we can determine the limits of the probability intervals, i.e. the boundaries corresponding to a distance from the 45° axis of symmetry equal to a constant number of standard deviations

Summary

Introduction

DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. A different model, called Robust Multiarray Analysis (RMA), was proposed by Speed, Bolstad, Irizarry and co-workers [5,6,7] (see Bolstad, B.M., 2004, PhD Thesis, University of California, Berkeley) It uses a log-transform of the data implicitly assuming that the error is proportional to the signal intensity. Once the representative value of the gene expression is known, standard statistical methods of comparison can be used for "high level" analysis of the observed differences. Such undesirable effects are often significant and can be detected only by detailed comparisons of the individual replicate samples

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Biology Direct	Publication Date: Jan 1, 2006
Citations: 57	License type: cc-by

R Discovery Prime

R Discovery Prime

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology Direct

Lead the way for us

Similar Papers

Expression profiling: Opportunities and pitfalls and impact on the study and management of allergic diseases
Santa Jeremy Ono ... Gilbert Jay
Journal of Allergy and Clinical Immunology | VOL. 112
Santa Jeremy Ono, et. al.Santa Jeremy Ono ... Gilbert Jay
01 Dec 2003
Journal of Allergy and Clinical Immunology | VOL. 112

PTU-030 RNA sequencing of colorectal cancer biospecimens characterises differential gene expression in old and young patients
Joanna Anderson ... Lennard Lee
-
Joanna Anderson, et. al.Joanna Anderson ... Lennard Lee
01 Jun 2018
01 Jun 2018

Biophysical models of ciliary activity: Gaussian frequency distributions
P Thyberg ... L.G Wiman
European Biophysics Journal | VOL. 18
P Thyberg, et. al.P Thyberg ... L.G Wiman
01 Mar 1990
European Biophysics Journal | VOL. 18

Mechanisms of Endothelial Cell Heterogeneity in Health and Disease
William C Aird
Circulation Research | VOL. 98
William C AirdWilliam C Aird
03 Feb 2006
Circulation Research | VOL. 98

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biology Direct