This paper investigates how the grammatical knowledge obtained in the initial language (English) of neural language models (LMs) influences the learning of grammatical structures in their second language (Korean). To achieve this objective, we conduct the now well- established experimental procedure, including (i) pre-training transformer-based GPT-2 LMs with Korean and English datasets, (ii) further fine-tuning them with a specific set of Korean data as L1 or L2, and (iii) evaluating them with the test data of KBLiMP while analyzing their linguistic generalization in L1 or L2. We have found negative transfer effects in the comparison between English as L1 and Korean as L2. Furthermore, in the trajectory analysis, the second language-learning LM has captured linguistic features of Korean including syntax, syntax-semantics interface, and morphology during the progressive training step. Our study of second language learning in LMs contributes to predicting potential syntactic challenges arising from the interference by the L1 language during the learning of Korean as a foreign language.
Read full abstract