Background Traditionally, DNA methylation is described as a mechanism that represses gene expression by blocking the binding of transcription factors at promoter regions. However, many studies suggest that DNA methylation variation makes a number of diverse contributions to gene expression and its complex underlying mechanisms. Methods To uncover the relationship and mechanism between gene expression and DNA methylation, we tested gene and CpG Pairs (GCPs) for the correlation between gene expression and methylation across different tissues and ages. We collected tissues that have both methylation data and matched expression data from various databases and publications. After filtering samples, we retained 502 adult Prefrontal Cortices (PFC), 43 Lymphoblastoid Cell Llines (LCL), 75 livers, 1201 monocytes, as well as 247 developmental PFCs across the lifespan. To analyze the methylation data, we used the ChAMP package for filtering. For the expression data from different platforms, we used different values as the expression measure. ComBat and SVA were used to control batch effects and other effects for both expression and methylation data. To remove the less variable methylation, we also filtered out 50% methylation probes with small median absolute deviation. For each region, we defined pairs of methylation CpG site and gene expression as CpG sites located within the 10kb flanking region of the corresponding gene or probe. We used Spearman's rank correlations test to assess the correlation of methylation-expression pairs. Results Besides, we performed the False Discovery Rate (FDR) to correct for the multiple hypothesis testing. In the adult PFC region, 230,087 GCPs were analyzed for the Spearman correlation between gene expression and CpG site methylation, and only 8,828 GCPs were significantly correlated after FDR correction. Of these, 5,530 GCPs were negatively correlated and 3,298 GCPs were positively correlated. The developmental PFC and other tissues also exhibited a small proportion of significant GCPs, including 544/174,911 in the LCL, 2,281/210,025 in the liver, 5,134/46,831 in the monocyte and 9,729/134,690 in the developmental PFC. Though thousands of significant GCPs have been identified in different tissues, only 37 of them were consistent in at least three tissues; three of them were shared in all tissues. Additionally, there were 442 negatively and 153 positively correlated GCPs shared between the adult PFC and the developmental PFC. For the CpGs of significant GCPs, we explored the epigenetic state of these CpGs using the Roadmap 15-core state annotation. Negatively correlated CpGs were enriched in active regions containing Active TSS, Flanking Active TSS, and Enhancers, while positive ones were enriched in repressive regions such as Bivalent Enhancer and Repressed PolyComb. Discussion For the genes of significant GCPs, we performed the GO term enrichment analysis using WebGestalt. We found that genes in correlated GCPs in developmental PFC were enriched for neuron projection development (p= 3.96e-12, FDR= 6.11e-09), neuron differentiation (p= 1.08e-11, FDR= 9.25e-09) and synapse part (p= 0e+00, FDR= 0e+00). In the adult PFC data, genes were also enriched for neuron part (p= 2.93e-13, FDR= 3.02e-11), distinct from the GO term referring to genes in the developmental PFC. These results imply that DNA methylation has a special regulated function in cell specification and development.
Read full abstract