Abstract

Epigenetics is an essential biological frontier linking genetics to the environment, where DNA methylation is one of the most studied epigenetic events. In recent years, through the epigenome-wide association study (EWAS), researchers have identified thousands of phenotype-related methylation sites. However, the overlaps of identified phenotype-related DNA methylation sites between various studies are often quite small, and it might be due to the fact that methylation remodeling has a certain degree of randomness within the genome. Thus, the identification of robust gene-phenotype associations is crucial to interpreting pathogenesis. How to integrate the methylation values of different sites on the same gene and to mine the DNA methylation at the gene level remains a challenge. A recent study found that the DNA methylation difference of the gene body and promoter region has a strong correlation with gene expression. In this study, we proposed a Statistical difference of DNA Methylation between Promoter and Other Body Region (SIMPO) algorithm to extract DNA methylation values at the gene level. First, by choosing to smoke as an environmental exposure factor, our method led to significant improvements in gene overlaps (from 5 to 17%) between different datasets. In addition, the biological significance of phenotype-related genes identified by SIMPO algorithm is comparable to that of the traditional probe-based methods. Then, we selected two disease contents (e.g., insulin resistance and Parkinson’s disease) to show that the biological efficiency of disease-related gene identification increased from 15.43 to 44.44% (p-value = 1.20e–28). In summary, our results declare that mining the selective remodeling of DNA methylation in promoter regions can identify robust gene-level associations with phenotype, and the characteristic remodeling of a given gene’s promoter region can reflect the essence of disease.

Highlights

  • Epigenetics is a branch of genetics that studies the heritable changes in gene expression without changing the nucleotide sequence of a gene (Fraga et al, 2005), including DNA methylation, histone modification, and regulation of noncoding RNA, among which DNA methylation is one of the focuses in epigenetics (Dahl and Guldberg, 2003)

  • The results are shown in Supplementary Figure S1: for the SIMPO-TSS200 algorithm, the SIMPO scores of 43.44% of the genes are significantly related to the average mRNA transcription (Supplementary Figure S1G) (Supplementary Table S1); for the SIMPO-TSS1500 algorithm, the SIMPO scores of 41.22% of the genes are significantly related to the average mRNA transcription (Supplementary Figure S1H) (Supplementary Table S2); for the SIMPO-TSS200&TSS1500 algorithm, the SIMPO scores of 41.18% genes are significantly correlated with the average mRNA transcription (Supplementary Figure S1I) (Supplementary Table S3)

  • By comparing with known tobacco use disorder-associated genes, it is proved that the biological significance of phenotype-related genes identified by SIMPO algorithm is comparable to that of the traditional probe-based methods

Read more

Summary

Introduction

Epigenetics is a branch of genetics that studies the heritable changes in gene expression without changing the nucleotide sequence of a gene (Fraga et al, 2005), including DNA methylation, histone modification, and regulation of noncoding RNA, among which DNA methylation is one of the focuses in epigenetics (Dahl and Guldberg, 2003). The biological significance of phenotype-related genes identified by SIMPO algorithm is comparable to that of the traditional probe-based method (DMGs). We analyzed a set of samples (including 1,202 individuals) that contained both transcriptome and DNA methylation data and showed that the SIMPO scores of ∼40% of genes were significantly correlated with mRNA expression values, proving that SIMPO scores and mRNA expression of genes have good correlations. By comparing with known tobacco use disorder-associated genes, it is proved that the biological significance of phenotype-related genes identified by SIMPO algorithm is comparable to that of the traditional probe-based methods. The SIMPO algorithm has good robustness and biological efficacy and can be further applied to phenotype or disease research in the field of epigenetic biology

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call