STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature.

Farrokh Mehryary,Katerina Nastou,Tomoko Ohta,Lars Juhl Jensen,Sampo Pyysalo

doi:10.1093/bioinformatics/btae552

Abstract

Understanding biological processes relies heavily on curated knowledge of physical interactions between proteins. Yet, a notable gap remains between the information stored in databases of curated knowledge and the plethora of interactions documented in the scientific literature. To bridge this gap, we introduce ComplexTome, a manually annotated corpus designed to facilitate the development of text-mining methods for the extraction of complex formation relationships among biomedical entities targeting the downstream semantics of the physical interaction subnetwork of the STRING database. This corpus comprises 1287 documents with ∼3500 relationships. We train a novel relation extraction model on this corpus and find that it can highly reliably identify physical protein interactions (F1-score = 82.8%). We additionally enhance the model's capabilities through unsupervised trigger word detection and apply it to extract relations and trigger words for these relations from all open publications in the domain literature. This information has been fully integrated into the latest version of the STRING database. We provide the corpus, code, and all results produced by the large-scale runs of our systems biomedical on literature via Zenodo https://doi.org/10.5281/zenodo.8139716, Github https://github.com/farmeh/ComplexTome_extraction, and the latest version of STRING database https://string-db.org/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)

Lead the way for us

Journal: Bioinformatics (Oxford, England)	Publication Date: Sep 2, 2024
Citations: 2

Similar Papers

Abstract P6-06-01: Analyzing the physical and functional protein interaction landscape of breast cancer
M Kim ... M Soucheray
Cancer Research | VOL. 79
M Kim, et. al.M Kim ... M Soucheray
15 Feb 2019
Abstract P6-06-01: Analyzing the physical and functional protein interaction landscape of breast cancer
M Kim ... M Soucheray

Predicting Physical Interactions between Protein Complexes
Trevor Clancy ... Eivind Hovig
Molecular & Cellular Proteomics | VOL. 12
Trevor Clancy, et. al.Trevor Clancy ... Eivind Hovig
01 Jun 2013
Molecular & Cellular Proteomics | VOL. 12

The Impact of Gene Expression Regulation on Evolution of Extracellular Signaling Pathways
Varodom Charoensawan ... Sarah A Teichmann
Molecular & Cellular Proteomics | VOL. 9
Varodom Charoensawan, et. al.Varodom Charoensawan ... Sarah A Teichmann
01 Dec 2010
Molecular & Cellular Proteomics | VOL. 9

Probing Genuine Strong Interactions and Post-translational Modifications in the Heterogeneous Yeast Exosome Protein Complex
Silvia A Synowsky ... Albert J.R Heck
Molecular & Cellular Proteomics | VOL. 5
Silvia A Synowsky, et. al.Silvia A Synowsky ... Albert J.R Heck
01 Sep 2006
Molecular & Cellular Proteomics | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

STRING-ing together protein complexes: corpus and methods for extracting physical protein interactions from the biomedical literature.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics (Oxford, England)