Abstract

Background Extraction of protein-protein interactions (PPIs) reported in scientific publications is a core topic of biomedical text mining. The ultimate goal is to devise a PPI extraction method that performs well on large amount of unseen text independently from the training corpus. One popular, machine-learning based approach to PPI extraction builds on the convolution kernels, i.e., similarity functions defined on the parse-based representation of sentences and interactions. Kernel functions differ in (1) the underlying sentence representation (bag-of-words, syntax tree parse, dependency graphs), (2) the substructures retrieved from the sentence representation to define interactions, and (3) calculation of the similarity function.

Highlights

  • Extraction of protein-protein interactions (PPIs) reported in scientific publications is a core topic of biomedical text mining

  • Apart from the shortest path between the proteins of the candidate interaction, kBSPS adds all nodes within distance k from this path to the vertex-walk representation

  • We present a novel kernel method called k-band shortest path spectrum kernel, an extension of the Results We evaluated kBSPS kernel on the 5 standard PPI benchmark corpora (AIMed, BioInfer, HPRD50, IEPA, LLL)

Read more

Summary

Introduction

Extraction of protein-protein interactions (PPIs) reported in scientific publications is a core topic of biomedical text mining. Spectrum tree kernel (SpT) [1]. It combines three ideas: First, interactions are represented as vertex-walks as in SpT but adapted to dependency graphs. The kBSPS kernel includes edge labels into vertex-walks, exploiting the dependency type of a relationship.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call