A Verb-Centric Approach for Relationship Extraction in Biomedical Text

Abhishek Sharma,Hui Yang,Rajesh Swaminathan

doi:10.1109/icsc.2010.14

Abstract

Advances in biomedical technology and research have resulted in a large number of research findings, which are primarily published in unstructured text such as journal articles. Text mining techniques have been thus employed to extract knowledge from such data. In this article we focus on the task of identifying and extracting relations between bio-entities such as green tea and breast cancer. Unlike previous work that employs heuristics such as co-occurrence patterns and handcrafted syntactic rules, we propose a verb-centric algorithm. This algorithm identifies and extracts the main verb(s) in a sentence, therefore, it does not require the usage of predefined rules or patterns. Using the main verb(s) it then extracts the two involved entities of a relationship. The biomedical entities are identified using a dependence parse tree by applying syntactic and linguistic features such as preposition phrases and semantic role analysis. The proposed verb-centric approach can effectively handle complex sentence structures such as clauses and conjunctive sentences. We evaluate the algorithm on several data sets and achieve an average F-score of 0.905, which is significantly higher than that of previous work.

Full Text