Abstract
Shared task evaluation campaigns represent a well established form of competitive evaluation, an important opportunity to propose and tackle new challenges for a specific research area and a way to foster the development of benchmarks, tools and resources. The advantages of this approach are evident in any experimental field, including the area of Natural Language Processing. An outlook on state–of–the–art language technologies for Italian can be obtained by reflecting on the results of the recently held workshop “Evaluation of NLP and Speech Tools for Italian”, EVALITA 2014. The motivations underlying individual shared tasks, the level of knowledge and development achieved within each of them, the impact on applications, society and economy at large as well as directions for future research will be discussed from this perspective.
Highlights
Evaluation of achieved results is a crucial process of scientific research
For the DPIE task, the standard evaluation in terms of Labeled Attachment Score (LAS)/Unlabelled Attachment Score (UAS) computed on individual attachments does not seem to always correlate with the evaluation based on semantically-oriented relations, which are more relevant for Information Extraction applications, as suggested among others by [79]
It can be observed that, in this case, there is a significant overlapping of the outlines: low scored relations are hard to predict for every participant system, at a different extent
Summary
Evaluation of achieved results is a crucial process of scientific research. This applies to the area of Natural Language Processing (NLP): establishing a well–grounded evaluation methodology makes it easier to track advances in the field and to assess the impact of the work done. The comparison of the results of different systems is not a trivial task as many parameters can affect and influence this process. To overcome this issue, over the last ten years shared task evaluation campaigns started being increasingly popular as a competitive form of evaluation. Shared task evaluation campaigns represent an important opportunity to investigate ways to tackle the challenges a specific research area is facing, where different approaches to a well–defined problem are compared based on their performance on the same task with respect to the same dataset. The datasets used within evaluation campaigns become reference resources of the scientific community and are used to assess effectiveness and performance of a given system or technology with respect to a specific task
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have