T-cell receptors (TCRs) elicit and mediate the adaptive immune response by recognizing antigenic peptides, a process pivotal for cancer immunotherapy, vaccine design, and autoimmune disease management. Understanding the intricate binding patterns between TCRs and peptides is critical for advancing these clinical applications. While several computational tools have been developed, they neglect the directional semantics inherent in sequence data, which are essential for accurately characterizing TCR-peptide interactions. To address this gap, we develop TPepRet, an innovative model that integrates subsequence mining with semantic integration capabilities. TPepRet combines the strengths of the Bidirectional Gated Recurrent Unit (BiGRU) network for capturing bidirectional sequence dependencies with the Large Language Model framework to analyze subsequences and global sequences comprehensively, which enables TPepRet to accurately decipher the semantic binding relationship between TCRs and peptides. We have evaluated TPepRet to a range of challenging scenarios, including performance benchmarking against other tools using diverse datasets, analysis of peptide binding preferences, characterization of T cells clonal expansion, identification of true binder in complex environments, assessment of key binding sites through alanine scanning, validation against expression rates from large-scale datasets, and ability to screen SARS-CoV-2 TCRs. The comprehensive results suggest that TPepRet outperforms existing tools. We believe TPepRet will become an effective tool for understanding TCR-peptide binding in clinical treatment. The source code can be obtained from https://github.com/CSUBioGroup/TPepRet.git. Supplementary data are available at Bioinformatics online.
Read full abstract