Accurate protein function prediction is crucial for understanding biological processes and advancing biomedical research. However, the rapid growth of protein sequences far outpaces the experimental characterization of their functions, necessitating the development of automated computational methods. We present InterLabelGO+, a hybrid approach that integrates a deep learning-based method with an alignment-based method for improved protein function prediction. InterLabelGO+ incorporates a novel loss function that addresses label dependency and imbalance and further enhances performance through dynamic weighting of the alignment-based component. A preliminary version of InterLabelGO+ achieved a strong performance in the CAFA5 challenge, ranking sixth out of 1625 participating teams. Comprehensive evaluations on large-scale protein function prediction tasks demonstrate InterLabelGO+'s ability to accurately predict Gene Ontology terms across various functional categories and evaluation metrics. The source code and datasets for InterLabelGO+ are freely available on GitHub at https://github.com/QuanEvans/InterLabelGO. A web-server is available at https://seq2fun.dcmb.med.umich.edu/InterLabelGO/. The software is implemented in Python and PyTorch, and is supported on Linux and macOS.
Read full abstract