Abstract
BackgroundGene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. For that purpose expression levels of several thousand genes are measured simultaneously using DNA microarrays. Comparing two distinct groups of tissue samples to detect those genes which are differentially expressed one statistical test per gene is performed, and resulting p-values are adjusted to control the false discovery rate. In addition, the expression change of each gene is quantified by some effect measure, typically the log fold change. In certain cases, however, a gene with a significant p-value can have a rather small fold change while in other cases a non-significant gene can have a rather large fold change. The biological relevance of the change of gene expression can be more intuitively judged by a fold change then merely by a p-value. Therefore, confidence intervals for the log fold change which accompany the adjusted p-values are desirable.ResultsIn a new approach, we employ an existing algorithm for adjusting confidence intervals in the case of high-dimensional data and apply it to a widely used linear model for microarray data. Furthermore, we adopt a concept of different relevance categories for effects in clinical trials to assess biological relevance of genes in microarray experiments. In a brief simulation study the properties of the adjusting algorithm are maintained when being combined with the linear model for microarray data. In two cancer data sets the adjusted confidence intervals can indicate significance of large fold changes and distinguish them from other large but non-significant fold changes. Adjusting of confidence intervals also corrects the assessment of biological relevance.ConclusionsOur new combination approach and the categorization of fold changes facilitates the selection of genes in microarray experiments and helps to interpret their biological relevance.
Highlights
Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework
Simulation results The behavior of the false coverage-statement rate (FCR) and the conditional coverage probability (CCP) in our simulation study was such as described in a similar setting by Benjamini and Yekutieli [3]
When selecting genes by BH-adjusted p-values and constructing the related BHadjusted confidence intervals, the CCP increases with increasing fold changes (Figure 1, left)
Summary
Gene expression experiments are common in molecular biology, for example in order to identify genes which play a certain role in a specified biological framework. When simultaneously testing a large number of hypotheses, a high number of false positive test results is expected This applies in the case of highdimensional data, where the number m of features is much larger than the available sample size n. A prime example of high-dimensional data are gene expression levels from DNA microarray confidence intervals are not comparable to FDR-adjusted p-values. A similar algorithm was introduced by Jung et al [4] who studied adjusted confidence intervals for the fold change of protein expression levels. The latter algorithm produces adjusted confidence intervals which match their related adjusted p-values in the sense that they lead to the same test decision, it has the drawback of gene-specific confidence levels. The algorithm of Benjamini and Yekutieli [3] uses the same adjusted confidence level for all genes
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have