Abstract

BackgroundWe explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems. Both sub-components address event extraction as a structured prediction problem, and use dual decomposition (UMass) and parsing algorithms (Stanford) to find the best scoring event structure. Our primary focus is on stacking where the predictions from the Stanford system are used as features in the UMass system. For comparison, we look at simpler model combination techniques such as intersection and union which require only the outputs from each system and combine them directly.ResultsFirst, we find that stacking substantially improves performance while intersection and union provide no significant benefits. Second, we investigate the graph properties of event structures and their impact on the combination of our systems. Finally, we trace the origins of events proposed by the stacked model to determine the role each system plays in different components of the output. We learn that, while stacking can propose novel event structures not seen in either base model, these events have extremely low precision. Removing these novel events improves our already state-of-the-art F1 to 56.6% on the test set of Genia (Task 1). Overall, the combined system formed via stacking ("FAUST") performed well in the BioNLP 2011 shared task. The FAUST system obtained 1st place in three out of four tasks: 1st place in Genia Task 1 (56.0% F1) and Task 2 (53.9%), 2nd place in the Epigenetics and Post-translational Modifications track (35.0%), and 1st place in the Infectious Diseases track (55.6%).ConclusionWe present a state-of-the-art event extraction system that relies on the strengths of structured prediction and model combination through stacking. Akin to results on other tasks, stacking outperforms intersection and union and leads to very strong results. The utility of model combination hinges on complementary views of the data, and we show that our sub-systems capture different graph properties of event structures. Finally, by removing low precision novel events, we show that performance from stacking can be further improved.

Highlights

  • We explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems

  • We aim to uncover the answers to these related questions: Which strategies are effective and why? How do the outputted event structures change after performing model combination? are there systematic errors which can be corrected to improve performance further?

  • Note that for the Epigenetics and Post-translational Modifications (EPI) and Infectious Diseases (ID) tasks we show the CORE metric next to the official FULL metric

Read more

Summary

Introduction

We explore techniques for performing model combination between the UMass and Stanford biomedical event extraction systems. Both sub-components address event extraction as a structured prediction problem, and use dual decomposition (UMass) and parsing algorithms (Stanford) to find the best scoring event structure. Most approaches to the BioNLP event extraction task [1,2] use a single model to produce their output. Using a weighted voting scheme to combine the outputs from the top six systems, the task organizers obtained a 4% absolute F1 improvement over the best system used in isolation. We aim to uncover the answers to these related questions: Which strategies are effective and why? How do the outputted event structures change after performing model combination? are there systematic errors which can be corrected to improve performance further?

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call