Oracle and Human Baselines for Native Language Identification

Shervin Malmasi,Mark Dras,Joel Tetreault

doi:10.3115/v1/w15-0620

Oracle and Human Baselines for Native Language Identification

Shervin Malmasi, Mark Dras + Show 1 more

Open Access

https://doi.org/10.3115/v1/w15-0620

Copy DOI

Publication Date: Jan 1, 2015

Citations: 32

Affiliation: Yahoo (United States)

#Native Language Identification #Performance Gap + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We examine different ensemble methods, including an oracle, to estimate the upper-limit of classification accuracy for Native Language Identification (NLI). The oracle outperforms state-of-the-art systems by over 10% and results indicate that for many misclassified texts the correct class label receives a significant portion of the ensemble votes, often being the runner-up. We also present a pilot study of human performance for NLI, the first such experiment. While some participants achieve modest results on our simplified setup with 5 L1s, they did not outperform our NLI system, and this performance gap is likely to widen on the standard NLI setup.

Full Text