Simple and efficient machine learning frameworks for identifying protein-protein interaction relevant articles and experimental methods used to study the interactions

Shashank Agarwal,Hong Yu,Feifan Liu

doi:10.1186/1471-2105-12-s8-s10

Shashank Agarwal, Hong Yu + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-12-s8-s10

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Oct 3, 2011
Citations: 33	License type: CC BY 2.0

Affiliation: University of Wisconsin–Milwaukee

Abstract

BackgroundProtein-protein interaction (PPI) is an important biomedical phenomenon. Automatically detecting PPI-relevant articles and identifying methods that are used to study PPI are important text mining tasks. In this study, we have explored domain independent features to develop two open source machine learning frameworks. One performs binary classification to determine whether the given article is PPI relevant or not, named “Simple Classifier”, and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named “OntoNorm”.ResultsWe evaluated our system in the context of BioCreative challenge competition using the standardized data set. Our systems are amongst the top systems reported by the organizers, attaining 60.8% F1-score for identifying relevant documents, and 52.3% F1-score for mapping articles to interaction method ontology.ConclusionOur results show that domain-independent machine learning frameworks can perform competitively well at the tasks of detecting PPI relevant articles and identifying the methods that were used to study the interaction in such articles.AvailabilitySimple Classifier is available at http://sourceforge.net/p/simpleclassify/home/ and OntoNorm at http://sourceforge.net/p/ontonorm/home/.

Highlights

Protein-protein interactions (PPI) are responsible for many biological phenomena
Availability: Simple Classifier is available at http://sourceforge.net/p/simpleclassify/home/ and OntoNorm at http:// sourceforge.net/p/ontonorm/home/
For article classification task (ACT), we found that performance of Support Vector Machines (SVM)-based classifiers was better than Naïve Bayes Multinomial (NBM)-based classifiers, during tuning we found that NBM-based classifiers performed better

Summary

Introduction

Protein-protein interactions (PPI) are responsible for many biological phenomena. Understanding these interactions can greatly benefit biological research; for example, it can help us understand causes of certain diseases which can in turn lead to development of therapeutic interventions. The importance of PPIs has led to the development of several curated databases including IntAct [2], BioGRID [3] and MINT [4] These databases are generally curated manually by humans and store information including the proteins that interact with each other, the articles in which these interactions were detected and the methods that were used to discover these interactions. Manually curating articles for PPIs is a time consuming process and due to the fast rate of research and rapid increase in amount of published literature, the amount of effort required to maintain such databases has increased significantly. This has spurred the development of text-mining approaches to automate identification of such interactions and help the manual curation process. One performs binary classification to determine whether the given article is PPI relevant or not, named “Simple Classifier”, and the other one maps the PPI relevant articles with corresponding interaction method nodes in a standardized PSI-MI (Proteomics Standards Initiative-Molecular Interactions) ontology, named “OntoNorm”

Objectives

Methods

Results

Discussion

Conclusion