Abstract

DNA N6-methyldeoxyadenosine (6 mA) modifications were first found more than 60 years ago but were thought to be only widespread in prokaryotes and unicellular eukaryotes. With the development of high-throughput sequencing technology, 6 mA modifications were found in different multicellular eukaryotes by using experimental methods. However, the experimental methods were time-consuming and costly, which makes it is very necessary to develop computational methods instead. In this study, a machine learning-based prediction tool, named csDMA, was developed for predicting 6 mA modifications. Firstly, three feature encoding schemes, Motif, Kmer, and Binary, were used to generate the feature matrix. Secondly, different algorithms were selected into the prediction model and the ExtraTrees model received the best AUC of 0.878 by using 5-fold cross-validation on the training dataset. Besides, the ExtraTrees model also received the best AUC of 0.893 on the independent testing dataset. Finally, we compared our method with state-of-the-art predictors and the results shown that our model achieved better performance than existing tools.

Highlights

  • DNA N6-methyldeoxyadenosine (6 mA) modifications were first discovered in Bacteria in 19551

  • The benchmark datasets created in the iDNA6mA-Pseudo K-tuple Nucleotide Composition (PseKNC) and i6mA-Pred predictors were used and different algorithms were implemented to generate the final optimized model. 5-fold cross-validation was performed and the prediction results demonstrated that our model achieved a better performance than existing 6 mA prediction tools

  • We developed an improved tool, called csDMA, for predicting 6 mA modifications in different species

Read more

Summary

Introduction

DNA N6-methyldeoxyadenosine (6 mA) modifications were first discovered in Bacteria in 19551. In 2016, Koziol et al used dot blots, HPLC, and methyl DNA immunoprecipitation followed by sequencing (MeDIP-seq) to detect 6 mA modifications in vertebrates including Xenopus laevis, mouse and human[6]. As the experimental methods are time-consuming and costly, researchers are trying to predict DNA 6 mA modifications by using computational methods. IDNA6mA-PseKNC is the first prediction tool for predicting 6 mA modifications in the Mus musculus genome and i6mA-Pred is the first identification method in the rice genome. The feature extraction and classification methods proposed in these studies provide a valuable basis for the prediction of DNA 6 mA modifications. Sequence identity threshold to develop a prediction tool that can be used to predict DNA 6 mA modifications across species.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.