A Comprehensive Benchmark of Kernel Methods to Extract Protein–Protein Interactions from Literature

Domonkos Tikk,Ulf Leser,Jörg Hakenberg,Philippe Thomas,Peter Palaga,Andrey Rzhetsky

doi:10.1371/journal.pcbi.1000837

Abstract

The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein–protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.

Highlights

Protein-protein interactions (PPIs) are integral to virtually all cellular processes, such as metabolism, signaling, regulation, and proliferation
We show that certain classes of methods for extracting PPIs are clearly superior to other classes
We studied how these methods behave in different evaluation settings, using different parameter sets, and on different gold standard corpora

Summary

Introduction

Protein-protein interactions (PPIs) are integral to virtually all cellular processes, such as metabolism, signaling, regulation, and proliferation. Results of high-throughput techniques (such as two-hybrid screens and mass spectrometry) usually are published in tabular form and can be imported by renowned PPI databases quickly. These techniques are prone to produce comparably large numbers of false positives [7]. Other techniques, such as coimmunoprecipitation, cross-linking, or rate-zonal centrifugation, produce more reliable results but are small-scale; these are typically used to verify interesting yet putative interactions, possibly first hypothesized during large-scale experiments [8]. Authors started to submit results directly to PPI databases in a regular manner, oftentimes as a step required by publishers to ensure quality

Objectives

Methods

Results

Discussion

Conclusion