Abstract

N4-methylcytosine (4 mC) is an important and common methylation which widely exists in prokaryotes. It plays a crucial role in correcting DNA replication errors and protecting host DNA against degradation by restrictive enzymes. Hence, the accurate identification for 4 mC sites is greatly significant for understanding biological functions and treating gene diseases. In this paper, a novel model is designed for identifying 4 mC sites. Firstly, we extract features from original sequences by multi-source feature representation methods, which are mono-nucleotide binary and k-mer frequency, dinucleotide binary and position-specific frequency, ring-function-hydrogen-chemical properties, dinucleotide-based DNA properties and trinucleotide-based DNA properties. Subsequently, gradient boosting decision tree is applied to select the optimal feature set and remove redundant information. Finally, support vector machine is employed to predict 4 mC or non-4mC sites. The accuracies of six datasets reach 0.851, 0.859, 0.801, 0.87, 0.859 and 0.901, respectively, which are superior to previous prediction methods. Therefore, the results show that our predictor is a feasible and effective tool for identifying 4 mC sites. Furthermore, an online web server is established at http://dnan4c.zhanglab.site.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.