4 mC site recognition algorithm based on pruned pre-trained DNABert-Pruning model and fused artificial feature encoding

Guo-Bo Xie,Yi Yu,Zhi-Yi Lin,Rui-Bin Chen,Jian-Hui Xie,Zhen-Guo Liu

doi:10.1016/j.ab.2024.115492

Abstract

DNA 4 mC plays a crucial role in the genetic expression process of organisms. However, existing deep learning algorithms have shortcomings in the ability to represent DNA sequence features. In this paper, we propose a 4 mC site identification algorithm, DNABert-4mC, based on a fusion of the pruned pre-training DNABert-Pruning model and artificial feature encoding to identify 4 mC sites. The algorithm prunes and compresses the DNABert model, resulting in the pruned pre-training model DNABert-Pruning. This model reduces the number of parameters and removes redundancy from output features, yielding more precise feature representations while upholding accuracy.Simultaneously, the algorithm constructs an artificial feature encoding module to assist the DNABert-Pruning model in feature representation, effectively supplementing the information that is missing from the pre-trained features. The algorithm also introduces the AFF-4mC fusion strategy, which combines artificial feature encoding with the DNABert-Pruning model, to improve the feature representation capability of DNA sequences in multi-semantic spaces and better extract 4 mC sites and the distribution of nucleotide importance within the sequence. In experiments on six independent test sets, the DNABert-4mC algorithm achieved an average AUC value of 93.81%, outperforming seven other advanced algorithms with improvements of 2.05%, 5.02%, 11.32%, 5.90%, 12.02%, 2.42% and 2.34%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

4 mC site recognition algorithm based on pruned pre-trained DNABert-Pruning model and fused artificial feature encoding

Abstract

Talk to us

Similar Papers

More From: Analytical Biochemistry

Lead the way for us

Journal: Analytical Biochemistry	Publication Date: Mar 6, 2024
Citations: 1

Similar Papers

Automatic segmentation of hepatocellular carcinoma on dynamic contrast-enhanced MRI based on deep learning
Xiao Luo ... Daoying Geng
Physics in Medicine & Biology | VOL. 69
Xiao Luo, et. al.Xiao Luo ... Daoying Geng
12 Mar 2024
Physics in Medicine & Biology | VOL. 69

Development and Validation of a Raman Spectroscopic Classification Model for Cervical Intraepithelial Neoplasia (CIN).
Damien Traynor ... Kate Cuschieri
Cancers | VOL. 14
Damien Traynor, et. al.Damien Traynor ... Kate Cuschieri
06 Apr 2022
Cancers | VOL. 14

DrugFinder: Druggable Protein Identification Model Based on Pre-Trained Models and Evolutionary Information
Mu Zhang ... Fengqiang Wan
Algorithms | VOL. 16
Mu Zhang, et. al.Mu Zhang ... Fengqiang Wan
25 May 2023
Algorithms | VOL. 16

Preoperative prediction of lymph node metastasis using deep learning-based features
Renee Cattell ... Chuan Huang
Visual Computing for Industry, Biomedicine, and Art | VOL. 5
Renee Cattell, et. al.Renee Cattell ... Chuan Huang
07 Mar 2022
Visual Computing for Industry, Biomedicine, and Art | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

4 mC site recognition algorithm based on pruned pre-trained DNABert-Pruning model and fused artificial feature encoding

Abstract

Talk to us

Similar Papers

More From: Analytical Biochemistry