Abstract

As an important epigenetic modification, N4-methylcytosine not only controls DNA replication and cell cycle, but also participates in regulating cell differentiation and gene expression. However, the understanding of its biological function is far from enough. In order to further reveal the function and regulatory mechanism of 4mC, it is important to accurately identify the 4mC site and detect its distribution in the genome. In this study, we propose 4mcDeep-W2VC, a general and efficient deep neural network to identify 4mC sites. Different with other methods, our proposed predictor can automatically extract features based on DNA sequences. We use the word2vec algorithm to learn the distributed representation of k-mers instead of one-hot encoding. Compared with the traditional k-mers method, the distributed representation we obtained considers the potential relationship between k-mers. Next, we input the distributed representation of the DNA sequence into convolutional neural network to extract hidden high-level and more biological features. The experimental results show that our predictor can achieve better performance in identifying 4mC sites compared to the state-of-art predictors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call