Abstract

This paper studies semantic parsing for interlanguage (L2), taking semantic role labeling (SRL) as a case task and learner Chinese as a case language. We first manually annotate the semantic roles for a set of learner texts to derive a gold standard for automatic SRL. Based on the new data, we then evaluate three off-the-shelf SRL systems, i.e., the PCFGLA-parser-based, neural-parser-based and neural-syntax-agnostic systems, to gauge how successful SRL for learner Chinese can be. We find two non-obvious facts: 1) the L1-sentence-trained systems performs rather badly on the L2 data; 2) the performance drop from the L1 data to the L2 data of the two parser-based systems is much smaller, indicating the importance of syntactic parsing in SRL for interlanguages. Finally, the paper introduces a new agreement-based model to explore the semantic coherency information in the large-scale L2-L1 parallel data. We then show such information is very effective to enhance SRL for learner texts. Our model achieves an F-score of 72.06, which is a 2.02 point improvement over the best baseline.

Highlights

  • A learner language is an idiolect developed by a learner of a second or foreign language which may preserve some features of his/her first language

  • We ask two senior students majoring in Applied Linguistics to carefully annotate some L2-L1 parallel sentences with predicate–argument structures according to the specification of Chinese PropBank (CPB; Xue and Palmer, 2009), which is developed for L1

  • There have been some initial studies on defining annotation specification as well as corpora for syntactic analysis, there is almost no work on semantic parsing for interlanguages

Read more

Summary

Introduction

A learner language (interlanguage) is an idiolect developed by a learner of a second or foreign language which may preserve some features of his/her first language. We study semantic parsing for interlanguage, taking semantic role labeling (SRL) as a case task and learner Chinese as a case language. We ask two senior students majoring in Applied Linguistics to carefully annotate some L2-L1 parallel sentences with predicate–argument structures according to the specification of Chinese PropBank (CPB; Xue and Palmer, 2009), which is developed for L1. Modest rules are needed to handle some tricky phenomena. This is quite different from syntactic treebanking for learner sentences, where defining a rich set of new annotation heuristics seems necessary (Ragheb and Dickinson, 2012; Nagata and Sakaguchi, 2016; Berzak et al, 2016)

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.