Abstract
DNA 5-hydroxymethylcytosine (5hmC), N6-methyladenine (6mA) and N4-methylcytosine (4mC) are three common kinds of DNA modifications and involve in various of biological processes. Accurate genome-wide identification of 5hmC, 6mA and 4mC sites is invaluable for better understanding their biological functions. Due to the labor-intensive and expensive nature of experimental methods for the genome-wide detection of 5hmC, 6mA and 4mC, it is urgent to develop computational methods for this aim. Keeping this in mind, the current study was devoted to construct a machine learning-based method to identify 5hmC, 6mA and 4mC in multiple species. We initially proposed using K-tuple nucleotide frequency component, nucleotide chemical property and nucleotide frequency, and mono-nucleotide binary encoding scheme to formulate positive and negative samples. Subsequently, the Random Forest was utilized to perform the identification of 5hmC, 6mA and 4mC sites. Results of five-fold cross-validation test and independent dataset test showed that the proposed method could produce the excellent generalization ability, suggesting that our proposed method is good at identifying 5hmC, 6mA and 4mC sites. For the convenience of retrieving 5hmC, 6mA and 4mC sites, a web-server called iDNA-MS was established for the proposed method, which is freely accessible at http://lin-group.cn/server/iDNA-MS.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.