Abstract

DNA N6-methyladenine (6mA) is an epigenetic modification, which is involved in many biological regulation processes like DNA replication, DNA repair, transcription, and gene expression regulation. The widespread presence of this 6mA modification in eukaryotes has been unclear until recently. Studying the genome-wide distribution of 6mA can provide a deeper understanding of the epigenetic modification process and the biological processes it involves. Existing experimental techniques are time-consuming and computational machine learning methods have room for performance improvement. DNA N6-methyladenine prediction in eukaryotic cross-species shows low performance. Hence, there is a need for a more accurate, time-efficient method to predict the distribution of 6mA sites in eukaryotes . Since deep learning architectures have shown higher accuracy, we develop a customized VGG16 architecture-based model named 6mAVGG using convolution neural networks for the prediction of DNA 6mA sites in eukaryotes. We introduce a novel 3-dimensional encoding mechanism extending the one-hot encoding method to support the input of the VGG16 model. Specifically, the 10-fold cross-validation on the benchmark datasets for the proposed model achieves higher accuracies of 98.01%, 97.44%, 99.56% respectively for cross-species, Rice, and M. musculus genomes. The proposed model outperforms existing tools for the prediction of 6mA sites and has enhanced accuracies by 2.88%, 4.2%, 0.9% respectively for cross-species, Rice , and M. musculus genomes compared to the state of the art method SNNRice6mA. The model trained with benchmark data predicts 6mA sites of other species ArabidopsisThaliana , RosaChinensis , Drosophila , and Yeast with prediction accuracy over 70%. Thus, this model can be used for the genome-wide prediction of 6mA sites in eukaryotes .

Highlights

  • Epigenetics is the study of chemical modifications to DNA that change the way genes are expressed without altering the underlying genetic sequence [1], [76]

  • PERFORMANCE COMPARISON ON RICE 6mA BENCHMARK DATASET From our literature review, we found that there are seven existing tools including SNNRice-6mA (Yu et al, 2019) [30], i6mA-Pred (Chen et al, 2019) [23], SDM6A (Basith et al, 2019) [26], iDNA6mA (Tahir et al, 2019) [25], MM6mAPred (Pian et al, 2019) [24], iDNA6mA-rice (Lv et al, 2019) [27], and 6mA-RicePred (Huang et al, 2020) [72] built based on the 6mA sites data in the rice genome, which could predict the 6mA sites in the rice genome. iDNA6mAPseKNC [22] is a tool built based on the M. musculus dataset and can be applied in many other species (Feng et al, 2019)

  • The performance was compared based on the five metrics accuracy, sensitivity, specificity, Mathew correlation coefficient (MCC), and area under the curve (AUC) which are commonly used in many studies [30], [31], [64]

Read more

Summary

Introduction

Epigenetics is the study of chemical modifications to DNA that change the way genes are expressed without altering the underlying genetic sequence [1], [76]. DNA methylation is a type of epigenetic modification that results in unexpected activation or repression of genes [76]. The most common types of DNA methylation modifications are N4-methylcytosine (4mC), 5-methylcytosine (5mC), and N6-methyladenine (6mA) that have been found in both prokaryotic and eukaryotic genomes [2], [71]. Understanding the epigenetic modification process of DNA N6-methyladenine (6mA) in eukaryotes and the biological processes it involves in eukaryotes is a problem due to the lack of research in this area [67]. Finding the genome-wide distribution of DNA 6mA sites can provide a good understanding of the epigenetic modification process and the biological processes it involves

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call