Abstract
Lysine lactylation (Kla) is a post-translational modification (PTM) that holds significant importance in the regulation of various biological processes. While traditional experimental methods are highly accurate for identifying Kla sites, they are both time-consuming and labor-intensive. Recent machine learning advances have enabled computational models for Kla site prediction. In this study, we propose a novel framework that integrates sequence embedding with sequence descriptors to enhance the representation of protein sequence features. Our framework employs a BiGRU-Transformer architecture to capture both local and global dependencies within the sequence, while incorporating six sequence descriptors to extract biochemical properties and evolutionary patterns. Additionally, we apply a cross-attention fusion mechanism to combine sequence embeddings with descriptor-based features, enabling the model to capture complex interactions between different feature representations. Our model demonstrated excellent performance in predicting Kla sites, achieving an accuracy of 0.998 on the training set and 0.969 on the independent set. Additionally, through attention analysis and motif discovery, our model provided valuable insights into key sequence patterns and regions that are crucial for Kla modification. This work not only deepens the understanding of Kla's functional roles but also holds the potential to positively impact future research in protein modification prediction and functional annotation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have