Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.

Leyi Wei,Shasha Luan,Quan Zou,Ran Su,Luis Augusto Eijy Nagai

doi:10.1093/bioinformatics/bty824

Abstract

As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction-modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites. In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites. The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM. Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: Sep 19, 2018
Citations: 156

Similar Papers

PSP-PJMI: An innovative feature representation algorithm for identifying DNA N4-methylcytosine sites
Mingzhao Wang ... Shengquan Xu
Information Sciences | VOL. 606
Mingzhao Wang, et. al.Mingzhao Wang ... Shengquan Xu
20 May 2022
Information Sciences | VOL. 606

Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning.
Haodong Xu ... Peilin Jia
Briefings in Bioinformatics | VOL. 22
Haodong Xu, et. al.Haodong Xu ... Peilin Jia
24 Jun 2020
Briefings in Bioinformatics | VOL. 22

Identification of DNA N4-methylcytosine Sites via Multiview Kernel Sparse Representation Model
Chengwei Ai ... Prayag Tiwari
IEEE Transactions on Artificial Intelligence | VOL. 4
Chengwei Ai, et. al.Chengwei Ai ... Prayag Tiwari
01 Oct 2023
IEEE Transactions on Artificial Intelligence | VOL. 4

Identification of DNA modification sites based on elastic net and bidirectional gated recurrent unit with convolutional neural network
Bin Yu ... Xin Gao
Biomedical Signal Processing and Control | VOL. 75
Bin Yu, et. al.Bin Yu ... Xin Gao
12 Feb 2022
Biomedical Signal Processing and Control | VOL. 75

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics