Abstract

N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.

Highlights

  • DNA methylation modifications such as N4-methylcytosine (4mC), N6-methyladenine (6mA), and 5-methylcytosine (5mC) play important roles in epigenetic regulation of gene expression without altering the sequence, and it is widely distributed in the genome of different species [1]

  • We compare the performance of Convolutional Neural Networks (CNN) with CNN + Long Short-Term Memory (LSTM) based on the same training data under different settings of CNN

  • The result shows that the performance of CNN + LSTM is better than that of CNN, due to the ability of LSTM to learn the dependence structure underlying the sequence

Read more

Summary

Introduction

DNA methylation modifications such as N4-methylcytosine (4mC), N6-methyladenine (6mA), and 5-methylcytosine (5mC) play important roles in epigenetic regulation of gene expression without altering the sequence, and it is widely distributed in the genome of different species [1]. DNA N6-methyladenine (6mA) refers to the methylation of the 6th nitrogen atom of adenine, which has been found to play an important role in the epigenetic modification of eukaryotic DNA in recent years [2]. Greer et al [13] developed a method with the ultra-high-performance liquid chromatography and the mass spectrometry to discover the signals of DNA 6mA sites. These methods advanced the research of 6mA. Zhou et al [16] found through 6mA immunoprecipitation, mass spectrometry, and single molecule realtime that 0.2% of adenines in the rice genome are 6mA methylated and GAGG-rich sequences are the most significantly enriched for 6mA

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call