In this investigation we used statistical methods to select genes with expression profiles that partition classes and subclasses of biological samples. Gene expression data corresponding to liver samples from rats treated for 24 h with an enzyme inducer (phenobarbital) or a peroxisome proliferator (clofibrate, gemfibrozil or Wyeth 14,643) were subjected to a modified Z-score test to identify gene outliers and a binomial distribution to reduce the probability of detecting genes as differentially expressed by chance. Hierarchical clustering of 238 statistically valid differentially expressed genes partitioned class-specific gene expression signatures into groups that clustered samples exposed to the enzyme inducer or to peroxisome proliferators. Using analysis of variance (ANOVA) and linear discriminant analysis methods we identified single genes as well as coupled gene expression profiles that separated the phenobarbital from the peroxisome proliferator treated samples and discerned the fibrate (gemfibrozil and clofibrate) subclass of peroxisome proliferators. A comparison of genes ranked by ANOVA with genes assessed as significant by mixed linear models analysis [J. Comput. Biol. 8 (2001) 625] or ranked by information gain revealed good congruence with the top 10 genes from each statistical method in the contrast between phenobarbital and peroxisome proliferators expression profiles. We propose building upon a classification regimen comprised of analysis of replicate data, outlier diagnostics and gene selection procedures to utilize cDNA microarray data to categorize subclasses of samples exposed to pharmacologic agents.
Read full abstract