Abstract

BackgroundDetection of sentences that describe protein-protein interactions (PPIs) in biomedical publications is a challenging and unresolved pattern recognition problem. Many state-of-the-art approaches for this task employ kernel classification methods, in particular support vector machines (SVMs). In this work we propose a novel data integration approach that utilises semantic kernels and a kernel classification method that is a probabilistic analogue to SVMs. Semantic kernels are created from statistical information gathered from large amounts of unlabelled text using lexical semantic models. Several semantic kernels are then fused into an overall composite classification space. In this initial study, we use simple features in order to examine whether the use of combinations of kernels constructed using word-based semantic models can improve PPI sentence detection.ResultsWe show that combinations of semantic kernels lead to statistically significant improvements in recognition rates and receiver operating characteristic (ROC) scores over the plain Gaussian kernel, when applied to a well-known labelled collection of abstracts. The proposed kernel composition method also allows us to automatically infer the most discriminative kernels.ConclusionsThe results from this paper indicate that using semantic information from unlabelled text, and combinations of such information, can be valuable for classification of short texts such as PPI sentences. This study, however, is only a first step in evaluation of semantic kernels and probabilistic multiple kernel learning in the context of PPI detection. The method described herein is modular, and can be applied with a variety of feature types, kernels, and semantic models, in order to facilitate full extraction of interacting proteins.

Highlights

  • Detection of sentences that describe protein-protein interactions (PPIs) in biomedical publications is a challenging and unresolved pattern recognition problem

  • It is difficult to compare the results to full PPI extraction tasks, so we provide single kernel baseline results

  • This paper describes a smoothing approach, which is similar to the methods using semantic kernels created from WordNet [72] or Wikipedia information [73,74]

Read more

Summary

Introduction

Detection of sentences that describe protein-protein interactions (PPIs) in biomedical publications is a challenging and unresolved pattern recognition problem. Several semantic kernels are fused into an overall composite classification space In this initial study, we use simple features in order to examine whether the use of combinations of kernels constructed using word-based semantic models can improve PPI sentence detection. Protein-protein interactions (PPIs) are found by researchers through various search engines indexing these specific articles. Ad hoc query-based searches are more appropriate for temporary information needs, not persistent ones [2]. For research tasks such as pathway construction or population of PPI databases such as KEGG [3], MIPS [4], or BIND [5], PPI extraction becomes a continuous process. The aim is to develop applications that will enable habitual PPI searchers to find interactions without having to specify pairs of proteins or manually scan large amounts of text

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call