The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.
Read full abstract