Abstract

The Message Understanding Conferences (MUCs) represent one of the earliest and longest running efforts to evaluate language understanding technology. This article reviews the history of the MUCs and their evolution towards the use of common training and blind test sets, automated scoring, task decomposition into modular building blocks and tools for portability across languages and applications. Now that evaluation has become an accepted part of the developer's toolkit, it is important to understand the interplay between evaluation methods and the state of research. MUC was successful in generating excitement about text processing problems and in attracting talented researchers to the area. It also provided a functional decomposition of the information extraction problem into a series of simpler problems, thus allowing researchers to demonstrate successful systems and to spin off commercial products. However, the ultimate goal of accurate information extraction has been elusive; systems have become faster and cheaper to build, the evaluations have become harder, but overall accuracy in information extraction has improved only modestly. The MUC experience contrasts with experiences in other evaluations. For example, the spoken evaluation in the Air Travel Information System (ATIS) has shown dramatic improvement in error rate over time, but those evaluations were limited to a single domain and the metrics did not address interaction, even though real-time interactive systems were available. Looking across the history of MUC in the context of related evaluations, we can draw important lessons about the need for evaluation to evolve with the technology it evaluates, to balance costs against benefits and to weigh the divergent needs of the multiple stake-holders— developers, funders and users—in order to provide continuity while also providing the next set of challenges to the research community.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call