Abstract

DNA N6-methyladenine (6mA), as an essential component of epigenetic modification, cannot be neglected in genetic regulation mechanism. The efficient and accurate prediction of 6mA sites is beneficial to the development of biological genetics. Biochemical experimental methods are considered to be time-consuming and laborious. Most of the established machine learning methods have a single dataset. Although some of them have achieved cross-species prediction, their results are not satisfactory. Therefore, we designed a novel statistical model called i6mA-VC to improve the accuracy for 6mA sites. On the one hand, kmer and binary encoding are applied to extract features, and then gradient boosting decision tree (GBDT) embedded method is applied as the feature selection strategy. On the other hand, DNA sequences are represented by vectors through the feature extraction method of ring-function-hydrogen-chemical properties (RFHCP) and the feature selection strategy of ExtraTree. After fusing the two optimal features, a voting classifier based on gradient boosting decision tree (GBDT), light gradient boosting machine (LightGBM) and multilayer perceptron classifier (MLPC) is constructed for final classification and prediction. The accuracy of Rice dataset and M.musculus dataset with five-fold cross-validation are 0.888 and 0.967, respectively. The cross-species dataset is selected as independent testing dataset, and the accuracy reaches 0.848. Through rigorous experiments, it is demonstrated that the proposed predictor is convincing and applicable. The development of i6mA-VC predictor will become an effective way for the recognition of N6-methyladenine sites, and it will also be beneficial for biological geneticists to further study gene expression and DNA modification. In addition, an accessible web-server for i6mA-VC is available from http://www.zhanglab.site/ .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.