Abstract

DNA methylation is an epigenetic alteration that plays a fundamental part in governing gene regulatory processes. The DNA methylation mechanism affixes methyl groups to distinct cytosine residues, influencing chromatin architectures. Multiple studies have demonstrated that DNA methylation's regulatory effect on genes is linked to the beginning and progression of several disorders. Researchers have recently uncovered thousands of phenotype-related methylation sites through the epigenome-wide association study (EWAS). However, combining the methylation levels of several sites within a gene and determining the gene-level DNA methylation remains challenging. In this study, we proposed the supervised UMAP Assisted Gene-level Methylation method (sUAGM) for disease prediction based on supervised UMAP (Uniform Manifold Approximation and Projection), a manifold learning-based method for reducing dimensionality. The methylation values at the gene level generated using the proposed method are evaluated by employing various feature selection and classification algorithms on three distinct DNA methylation datasets derived from blood samples. The performance has been assessed employing classification accuracy, F-1 score, Mathews Correlation Coefficient (MCC), Kappa, Classification Success Index (CSI) and Jaccard Index. The Support Vector Machine with the linear kernel (SVML) classifier with Recursive Feature Elimination (RFE) performs best across all three datasets. From comparative analysis, our method outperformed existing gene-level and site-level approaches by achieving 100% accuracy and F1-score with fewer genes. The functional analysis of the top 28 genes selected from the Parkinson's disease dataset revealed a significant association with the disease.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call