It is a well known problem that standard techniques for analysing DNA chip data misspecify genes. In particular, genes that are confirmed to be active, often do not show up as potential candidates. This is possibly due to non-homogeneous distributions of expression levels over the whole expression range. We introduce a method that allows the detection of genes based on a self-adaptive threshold. The threshold is determined for equally-populated expression bands by assuming a normal distribution of logarithms of expression level ratios. By specifying a significance level, the threshold is set according to 'local' expression statistics within a band. We call this method the relative variance method (RVM). We derive a test statistic for the RVM and compare it with other methods. On this statistical basis, we show that RVM is a complementary approach to the t-test, significance analysis of microarrays (SAM) or empirical Bayes analysis of microarrays (EBAM). The RVM should be particularly useful for experiments with small sample size. Using a clinical dataset, we demonstrate that the RVM can correctly identify known marker genes, which are not found by the t-test, SAM or EBAM. In situations with limited sample material and small number of replicates, as is often the case in clinical datasets, use of the proposed RVM provides a higher reliability of potential candidate genes.
Read full abstract