Abstract
BackgroundAffymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study. These probe sets may or may not behave similar across treatments. The most appropriate way of consolidating sibling probe sets suitable for analysis is an open problem. We propose the Analysis of Variance (ANOVA) framework to decide which sibling probe sets can be consolidated.ResultsThe ANOVA model allows us to separate the sibling probe sets into two types: those behave similarly across treatments and those behave differently across treatments. We found that consolidation of sibling probe sets of the former type results in large increase in the number of differentially expressed genes under various statistical criteria. The approach to selecting sibling probe sets suitable for consolidating is implemented in R language and freely available from .ConclusionOur ANOVA analysis of sibling probe sets provides a statistical framework for selecting sibling probe sets for consolidation. Consolidating sibling probe sets by pooling data from each greatly improves the estimates of a gene expression level and results in identification of more biologically relevant genes. Sibling probe sets that do not qualify for consolidation may represent annotation errors or other artifacts, or may correspond to differentially processed transcripts of the same gene that require further analysis.
Highlights
Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study
We ask whether the differential expression over treatments among sibling probe sets follow the same trend or not in a two-way Analysis of Variance (ANOVA) model, which includes treatment (τ), probe set (ψ), as well as their interaction effect
Non-significant interaction effect indicates that the sibling probe sets have the same trend of differential expression over treatments
Summary
Affymetrix GeneChip typically contains multiple probe sets per gene, defined as sibling probe sets in this study These probe sets may or may not behave similar across treatments. In the mouse moe4302 chip, there are 45, 101 probe sets corresponding to 25, 724 distinct genes, and 40% of all genes are represented by multiple probe sets, called "sibling probe sets" throughout this paper. For these 40% of genes, almost half of them are represented by more than two probe sets on the chip, and some genes even have more than ten probe sets. In the human hgu133plus chip, the total of 28, 919 genes are represented by 54, 675 probe sets on the chip (Fig. 1)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have