Abstract

BackgroundCurrent state-of-the-art approaches to biological event extraction train statistical models in a supervised manner on corpora annotated with event triggers and event-argument relations. Inspecting such corpora, we observe that there is ambiguity in the span of event triggers (e.g., “transcriptional activity” vs. ‘transcriptional’), leading to inconsistencies across event trigger annotations. Such inconsistencies make it quite likely that similar phrases are annotated with different spans of event triggers, suggesting the possibility that a statistical learning algorithm misses an opportunity for generalizing from such event triggers.MethodsWe anticipate that adjustments to the span of event triggers to reduce these inconsistencies would meaningfully improve the present performance of event extraction systems. In this study, we look into this possibility with the corpora provided by the 2009 BioNLP shared task as a proof of concept. We propose an Informed Expectation-Maximization (EM) algorithm, which trains models using the EM algorithm with a posterior regularization technique, which consults the gold-standard event trigger annotations in a form of constraints. We further propose four constraints on the possible event trigger annotations to be explored by the EM algorithm.ResultsThe algorithm is shown to outperform the state-of-the-art algorithm on the development corpus in a statistically significant manner and on the test corpus by a narrow margin.ConclusionsThe analysis of the annotations generated by the algorithm shows that there are various types of ambiguity in event annotations, even though they could be small in number.

Highlights

  • Current state-of-the-art approaches to biological event extraction train statistical models in a supervised manner on corpora annotated with event triggers and event-argument relations

  • Current state-of-the-art approaches to biological event extraction train statistical models in a supervised learning manner on annotated corpora, where event triggers, or the expressions indicative of events, and event-argument relations, or relations between events and their participant, are annotated (e.g., [1, 2])

  • There would be similar phrases where the span of their counterparts of event triggers is differently annotated, and as a result, such event triggers are syntactically characterized in a different way, suggesting a possibility that a statistical learning algorithm is hard to generalize from such event triggers that are similar, but differently annotated in a training corpus

Read more

Summary

Methods

Following Björne and colleagues [5], we viewed the event extraction task as constructing directed graphs, where event triggers and event-argument relations are encoded with labeled nodes and edges, respectively. When turning to the label of edges, a question arises whether edges can be labeled with more than one role type, that is, whether an event takes a protein or another event both as THEME and CAUSE To answer this question, we constructed graphs for sentences in the training corpus of 800 annotated abstracts with the Head-Word rule. It begins with an initial model with all weights set to 0 (line 1) It takes several passes over the training corpus D = ((x1, y1), ..., (xN , yN )), where xi and yi are the i-th sentence and the gold-standard graphs that are automatically derived from the gold-standard event annotations using the Head-Word rule, respectively (line 2). In sentence (5), those graphs without any one of the event triggers of these two Positive Regulation events would violate the distance constraint with β ≤ 3

Background
Results and Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.