The HumanMethylation450 BeadChip array (450K; Infinium) is a widely used tool in epigenomics. A recognized concern in the 450K platform is the potential effect of the number of probes/gene (PG) on ranking differentially methylated (DM) CpGs (DM-CpGs) before testing for enrichment of gene ontology categories. We previously showed in a fatty acid (FA)-induced DNA methylation profiling study that when DM-CpGs are ranked by the number of called DM-CpGs-to-PG ratio, the 150 top-ranking gene list is enriched in pathways that overlap with the corresponding Affymetrix array-based expression data. In this study, a comparative analysis of thirteen 450K-based studies representing FA-stimulated cellular models, aging, diseased and normal tissues, revealed that the 150 top-ranking DM-CpGs are in high PG genes. This points to a significant false-negative rate in the low PG gene set when delta-beta-based ranking is performed. We show that PG is not related to the density of methylation-prone sites, as it does not follow gene length or GC content. Conversely, ranking genes by the number of DM-CpGs-to-PG ratio and analysing the 150 top-ranking entries yields significantly enriched gene disease- or tissue-specific function categories that are increased both in number and in the degree of overlap with expression data compared with delta-beta-only ranking or to the previously published gometh-based pipeline. The 15 top-ranking loci list is also significantly enriched in non-coding RNAs, a greatly underrepresented transcript type in 450K. In summary, the proposed simple normalization method yields pathobiologically relevant DM-CpGs. This method is relevant for the newly developed MethylationEPIC (Infinium) microarray.
Read full abstract