Abstract

DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome.

Highlights

  • Dynamic DNA modifications, such as methylation and demethylation play crucial roles in the regulation of gene expression

  • Methylation of cytosine at CpG sites is considered as an important epigenetic mark that is involved in the regulation of cell differentiation, genomic imprinting, cell cycle, aging, preservation of chromosome stability, and gene expression levels [1,2]

  • Due to the widespread distribution and multi-faceted roles of 5mC, it is the most well-explored and common type of cytosine methylation that illustrates a significant role in several biological processes [5,6] and is associated with neurological diseases, diabetes, and cancer [7,8,9]. 4mC is regarded as a potent epigenetic modification that protects its self-DNA from the restriction enzyme-mediated degradation

Read more

Summary

Introduction

Dynamic DNA modifications, such as methylation and demethylation play crucial roles in the regulation of gene expression. Methylation of cytosine at CpG sites is considered as an important epigenetic mark that is involved in the regulation of cell differentiation, genomic imprinting, cell cycle, aging, preservation of chromosome stability, and gene expression levels [1,2]. The three common cytosine methylations identified in both prokaryotic and eukaryotic genomes are N4 -methylcytosine (4mC), 5-methylcytosine (5mC) (mediated enzymatically via DNA methyltransferases), and. 4mC is regarded as a potent epigenetic modification that protects its self-DNA from the restriction enzyme-mediated degradation. The exact mechanisms of epigenetic modifications and biological functions of 4mC sites are limited

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.