Abstract

Information regarding the physical interactions among proteins is crucial, since protein–protein interactions (PPIs) are central for many biological processes. The experimental techniques used to verify PPIs are vital for characterizing and assessing the reliability of the identified PPIs. A lot of information about PPIs and the experimental methods are only available in the text of the scientific publications that report them. In this study, we approach the problem of identifying passages with experimental methods for physical interactions between proteins as an information retrieval search task. The baseline system is based on query matching, where the queries are generated by utilizing the names (including synonyms) of the experimental methods in the Proteomics Standard Initiative–Molecular Interactions (PSI-MI) ontology. We propose two methods, where the baseline queries are expanded by including additional relevant terms. The first method is a supervised approach, where the most salient terms for each experimental method are obtained by using the term frequency–relevance frequency (tf.rf) metric over 13 articles from our manually annotated data set of 30 full text articles, which is made publicly available. On the other hand, the second method is an unsupervised approach, where the queries for each experimental method are expanded by using the word embeddings of the names of the experimental methods in the PSI-MI ontology. The word embeddings are obtained by utilizing a large unlabeled full text corpus. The proposed methods are evaluated on the test set consisting of 17 articles. Both methods obtain higher recall scores compared with the baseline, with a loss in precision. Besides higher recall, the word embeddings based approach achieves higher F-measure than the baseline and the tf.rf based methods. We also show that incorporating gene name and interaction keyword identification leads to improved precision and F-measure scores for all three evaluated methods. The tf.rf based approach was developed as part of our participation in the Collaborative Biocurator Assistant Task of the BioCreative V challenge assessment, whereas the word embeddings based approach is a novel contribution of this article.Database URL: https://github.com/ferhtaydn/biocemid/

Highlights

  • The functions of proteins are often modulated through their interactions with other proteins

  • The subset of articles was selected according to the availability of the articles in ‘PMC Open Access’ (PubMed Central, RRID:SCR_004166) [37], as full text, as well as their availability in BioC format. 30 articles from this subset were randomly selected and annotated for passages that describe an experimental method as an evidence for a physical Protein–protein interactions (PPIs) and for the specific method that each passage describes by two annotators who have natural language processing and information retrieval background

  • The baseline approach does not need a training or validation set, since the existence of a name or a synonym of an experimental method determines the result of the annotation for that sentence

Read more

Summary

Introduction

The functions of proteins are often modulated through their interactions with other proteins. The PPI information in these databases is extracted manually by human curators from the published literature. Improvements have been obtained in extracting PPIs from text in the recent years [14, 15], enriching PPIs with context information including the experimental methods used to detect the PPIs has not been well studied yet [16]. Various experimental methods such as ‘affinity capture’, ‘two-hybrid’ and ‘coimmunoprecipitation’ are available for detecting protein interactions [1]. Besides the existence of an interaction between a pair of proteins, the experimental conditions in which this interaction was observed are very important for the interpretation and assessment of the interaction [16]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call