Abstract

N4-methylcytosine (4mC) is a common DNA methylation that has been implicated in epigenetic regulation and host defense. Accurate prediction of 4mC sites in DNA sequences will help to better explore the biological processes and mechanisms. For this problem, computational methods based on machine learning (ML) and deep learning (DL) are faster, less complex, and less expensive than experimental detection methods. However, the existing computational methods are still unsatisfactory in terms of prediction accuracy, so we propose a new method with better performance. In this work, we propose a weighted fuzzy system for identifying DNA 4mC sites by kernel entropy component analysis (KECA). We named it as W-TSK-FS-KECA. This model is improved based on the Takagi-Sugeuo-Kang fuzzy system (TSK-FS). We use position-specific trinucleotide propensity (PSTNP) to construct feature vectors on representative benchmark datasets. Then we use KECA to get the reconstruct error. Finally, we put the calculated reconstruction error add to the regular term of TSK-FS as the weights to enhance the model performance. Comparative experiments with other methods show that it has good classification performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call