Abstract

N4-acetylcytidine (ac4C) plays a crucial role in regulating cellular biological processes, particularly in gene expression regulation and disease development. However, experiments to identify ac4C in a wet lab are time-consuming and costly, and the learning-based methods struggle to capture the underlying semantic knowledge and relations within sequences. To address this, we propose a deep learning approach called NBCR-ac4C based on pretrained models. Specifically, we employ Nucleotide Transformer and DNABERT2 to construct contextual embedding of nucleotide sequences, which effectively mine and express context relations between different features in the sequence. Convolutional neural network (CNN) and ResNet18 are then applied to further extract shallow and deep knowledge from context embedding. Depending on extensive experiments for the prediction of ac4C sites in nucleotide sequences, we observe that NBCR-ac4C outperforms general learning-based models. It achieves the highest accuracy (ACC) of 83.51% and an Area Under the Receiver Operating Characteristic Curve (AUROC) of 89.58% on an independent test set. Moreover, the proposed model, compared to the current state-of-the-art (SOTA) model LSA-ac4C, demonstrates higher ACC and AUROC by 0.81-3.7% and 0.05-1.58%, respectively. The data set and code are available on https://github.com/2103374200/NBCR to facilitate further discussion on NBCR-ac4C.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.