Abstract
BackgroundInterpretation of gene expression microarrays requires a mapping from probe set to gene. On many Affymetrix gene expression microarrays, a given gene may be detected by multiple probe sets, which may deliver inconsistent or even contradictory measurements. Therefore, obtaining an unambiguous expression estimate of a pre-specified gene can be a nontrivial but essential task.ResultsWe developed scoring methods to assess each probe set for specificity, splice isoform coverage, and robustness against transcript degradation. We used these scores to select a single representative probe set for each gene, thus creating a simple one-to-one mapping between gene and probe set. To test this method, we evaluated concordance between protein measurements and gene expression values, and between sets of genes whose expression is known to be correlated. For both test cases, we identified genes that were nominally detected by multiple probe sets, and we found that the probe set chosen by our method showed stronger concordance.ConclusionsThis method provides a simple, unambiguous mapping to allow assessment of the expression levels of specific genes of interest.
Highlights
Interpretation of gene expression microarrays requires a mapping from probe set to gene
The gene detected by the largest number of probes in a probe set is considered the targeted gene of the probe set
The coverage score Sc of a probe set is the fraction of all transcripts belonging to the targeted gene that are detected by the probe set
Summary
Algorithm We acquired probe sequences for four widely used human gene expression microarrays from Affymetrix: U95Av2, U133A, U133 Plus 2.0, and X3P. Comparison with ER and HER2 status in breast tumors We evaluated the Jetset mappings using a publicly-available data set representing 286 breast tumor specimens with HG-U133A microarray measurements and ER protein status as determined by ligand-binding assay, enzyme immunoassay, or immunohistochemistry [19] In this data, we expected that the ER protein status should correlate with the ESR1 gene expression level. We evaluated the performance of two alternative probe set definitions: the Brainarray “hgu133ahsentrezgcdf” and the GATExplorer “genemapperhgu133acdf”, both of which redefine probe sets such that each queries an individual gene [4,8] In both cases, the remapped probe set querying ESR1 failed to detect strong differential expression between ER-positive and ER-negative tumors (Figure 2a). The Jetset data and package will be updated following the Bioconductor release cycle (~ every six months)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have