Abstract

4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations. Accurate prediction of 4mC sites can provide exact information to their hereditary functions. The purpose of this study was to establish a robust deep learning model to recognize 4mC sites in Geobacter pickeringii. In the anticipated model, two kinds of feature descriptors, namely, binary and k-mer composition were used to encode the DNA sequences of Geobacter pickeringii. The obtained features from their fusion were optimized by using correlation and gradient-boosting decision tree (GBDT)-based algorithm with incremental feature selection (IFS) method. Then, these optimized features were inserted into 1D convolutional neural network (CNN) to classify 4mC sites from non-4mC sites in Geobacter pickeringii. The performance of the anticipated model on independent data exhibited an accuracy of 0.868, which was 4.2% higher than the existing model.

Highlights

  • 4mC is a type of DNA alteration that has the ability to synchronize multiple biological movements, for example, DNA replication, gene expressions, and transcriptional regulations

  • Alterations in DNA play a significant role in gene expression and regulation, DNA

  • Features fusion were inserted into LSTM [21], gradient-boosting decision tree (GBDT) [22], and RF [23,24] to compare with the convolutional neural network (CNN)-based model [25]

Read more

Summary

Introduction

Alterations in DNA play a significant role in gene expression and regulation, DNA replication, and transcriptional regulation. A few computational and mathematical methods have been introduced to predict 4mC sites in multiple species. Introduced the first computational model to predict 4mC sites in multiple species on the basis of confirmed 4mC dataset. Tang et al [12] introduced the new linear integration method by merging the existing models for the identification of 4mC sites. 2022, 23, 1251 troduced the new linear integration method by merging the existing models for the identification of 4mC sites. Afterwards, Manavalan et al [13] established the new tool Meta4mCpred to recognize 4mC sites in six different species. For the first deepmodel learning model 4mCCNN by utilizing numerous feature combinations prediction of 4mC sites in multiple genomesgenomes [18].

Evaluation
Sequence Composition Analysis
Comparison on the Basis of Independent Data
Materials and Methods
Feature Descriptors
Binary
Correlation
GBDT with IFS
Convolutional Neural Network
Metrics Evaluation
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call