Bio-molecular event extraction using Support Vector Machine

Sriparna Saha,Amit Majumder,Asif Ekbal,Md Hasanuzzaman

doi:10.1109/icoac.2011.6165192

Abstract

The main goal of Biomedical Natural Language Processing (BioNLP) is to capture biomedical phenomena from textual data by extracting relevant entities, information and relations between biomedical entities (i.e. proteins and genes). In general, in most of the published papers, only binary relations were extracted. In a recent past, the focus is shifted towards extracting more complex relations in the form of bio-molecular events that may include several entities or other relations. In this paper we propose an approach that enables event extraction (detection and classification) of relatively complex bio-molecular events. We approach this problem as a supervised classificat ion problem and use the well-known algorithm, namely Support Vector Machine (SVM) that makes use of statistical and linguistic features that represent various morphological, syntactic and contextual information of the candidate bio-molecular trigger words. Firstly, we consider the problem of event detection and classification as a two-step process, first step of which deals with the event detection task and the second step classifies these identified events to one of the nine predefined classes. Later on we tr eat this problem as one-step process, and perform event detection and classification together. Three-fold cross validation expe riments on the BioNLP 2009 shared task datasets yield the overall average recall, precision and F-measure values of 62.95%, 74.53%, and 68.25%, respectively, for the event detection. We observed the overall classification accuracy of 72.50%. Evaluation resu lts of the proposed approach when detection and classification are performed together showed the overall recall, precision and F-measure values of 57.66%, 55.87%, and 56.75%, respectively.

Full Text