Protein-nucleic acid interactions play a crucial role in many physiological processes. Identifying the binding sites of nucleotides on the protein surface is the prerequisite for understanding the molecular recognition mechanisms between the two types of macromolecules and also provides the information to design or generate molecule modulators against these sites to manipulate biological function according to specific requirements. Existing studies mainly focus on characterizing local surfaces around sites, often neglecting the interrelationships among these sites and the global protein information. To address this gap, we propose NesT-NABind, a Nested Transformer for Nucleic Acid-Binding site prediction. This model leverages the Transformer's advanced capabilities in contextual understanding and long-range dependency capturing. Specifically, we introduce a local patch-scale Transformer to process surface information around each site and a global protein-scale transformer to integrate surface and sequence information on the entire protein. These two Transformers operate at different scales of protein, hence the term "nested". Experiments demonstrate that NesT-NABind achieves a 5.57% improvement in the F1 score and a 3.64% improvement in AUPRC compared to state-of-the-art methods. With the incorporation of global features, NesT-NABind shows an enhanced predictive capability for the challenging large proteins and therefore can be used in a much wider range of applications.
Read full abstract