Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. As biomedical literature continues to grow rapidly, there is an increasing need for automated and accurate extraction of these interactions to facilitate scientific discovery. Pretrained language models, such as generative pretrained transformers and bidirectional encoder representations from transformers, have shown promising results in natural language processing tasks. We evaluated the performance of PPI identification using multiple transformer-based models across three manually curated gold-standard corpora: Learning Language in Logic with 164 interactions in 77 sentences, Human Protein Reference Database with 163 interactions in 145 sentences, and Interaction Extraction Performance Assessment with 335 interactions in 486 sentences. Models based on bidirectional encoder representations achieved the best overall performance, with BioBERT achieving the highest recall of 91.95% and F1 score of 86.84% on the Learning Language in Logic dataset. Despite not being explicitly trained for biomedical texts, GPT-4 showed commendable performance, comparable to the bidirectional encoder models. Specifically, GPT-4 achieved the highest precision of 88.37%, a recall of 85.14%, and an F1 score of 86.49% on the same dataset. These results suggest that GPT-4 can effectively detect protein interactions from text, offering valuable applications in mining biomedical literature. The source code and datasets used in this study are available at https://github.com/hurlab/PPI-GPT-BERT.
Read full abstract