Quality control of Platinum Spike dataset by probe-level mixed models

Tatsiana Khamiakova,Ziv Shkedy,Dhammika Amaratunga,Willem Talloen,Hinrich Göhlmann,Luc Bijnens,Adetayo Kasim

doi:10.1016/j.mbs.2013.11.004

Abstract

Benchmark datasets are important for the validation and optimization of the analysis routes. Lately, a new benchmark dataset, ‘Platinum Spike’, for the Affymetrix GeneChip experiments has been introduced. We performed a quality check of the Platinum Spike dataset by using probe-level linear mixed models. The results have shown that there are ‘empty’ probe sets detecting transcripts, spiked in at different concentrations, and, reversely, there are probe sets that do not detect transcripts, spiked in at different concentrations, even though they were designed to do so. We proposed a formal inference procedure for testing the assumption of independence of all technical replicates in the data and concluded that for almost 10% of probe sets arrays cannot be treated independently, which has strong implications for the normalization procedures and testing for the differential expression. The proposed diagnostics procedure is used to facilitate a thorough exploration of gene expression Affymetrix data beyond the preprocessing and differential expression analysis.

Full Text