Abstract

In this paper, we present a Korean learner corpus and aim to find features to characterize this corpus. The corpus is based on an open writing test of Korean learners (beginners, intermediate and advanced students) with various topics and were manually evaluated and scored. We explore several types of features in the learner corpus by starting with the pre-processing of Korean sentences. Some features are automatically measured using parts of speech tagging which concerns the number of tokens and the correct use of Functional morphemes. Syntax-related features and topic-related features are measured while using the automatic syntactic parsing and statistical language models. These features can be used for language proficiency identification and other learner corpus related applications that make use of machine learning techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call