Illinois CCG LoReHLT 2016 named entity recognition and situation frame systems

Chen-Tse Tsai,Stephen Mayhew,Mark Sammons,Dan Roth,Yangqiu Song

doi:10.1007/s10590-017-9211-5

Abstract

This paper describes Illinois Cognitive Computation Group’s system for the 2016 NIST Low Resource Human Language Technology (LoReHLT) evaluation, in which the target language is Uyghur. We participate in two tasks, named entity recognition (NER) and situation frame (SF). For NER, we develop two models. The first model is a rule-based model, which is based on the knowledge obtained by inspecting the monolingual documents, reading the Uyghur grammar book, and interacting with the native informants. The second model is a transfer model, which is trained on the labeled Uzbek data. Combining the outputs of these two models yields significant improvement and achieves 60.4 F1-score on the official evaluation set. For the new SF task, we apply the dataless classification technique to build an English classifier for eight situation types, and use an Uyghur-to-English dictionary to translate the Uyghur documents. Using this classifier, we propose two frameworks of grounding situations to the locations mentioned in text.

Full Text