Abstract Motivation Protein function prediction is important in bioinformatics, driven by the increasing availability of protein sequence data from high-throughput sequencing technologies. Traditional methods for determining protein functions are costly and time-consuming, highlighting the need for computational approaches. Deep learning models are powerful tools in this area, but many are not optimized for brain development-related datasets. Understanding protein functions in brain development is essential for studying neurodevelopmental disorders. To address this, we developed RecGOBD (Recognition of Gene Ontology-related Brain Development protein function), designed to predict protein functions related to key brain development processes. Result RecGOBD focuses on ten critical Gene Ontology (GO) terms associated with brain development, extracting and embedding the protein sequences linked to these terms. Using advanced pre-trained models, we generated embeddings that capture both sequence and structural information, and applied attention mechanisms to align them with relevant GO terms. RecGOBD’s category attention layer improves prediction accuracy for brain development-related terms. Evaluated against five models using AUROC, AUPR, and Fmax metrics, RecGOBD outperformed in all measures. We also applied the model to predict protein functions related to autism spectrum disorder and analyzed how protein site mutations affect GO term changes. These results highlight RecGOBD’s potential for advancing protein function prediction, particularly in brain development, offering valuable insights into neurodevelopmental disorder research. Availability and implementation All Python codes associated with this study are available at https://github.com/ZL-Xia/RECGOBD.git. Supplementary information Supplementary data are available at Bioinformatics Advance online
Read full abstract