Languages with more speakers tend to be harder to (machine-)learn

Alexander Koplenig,Sascha Wolfer

doi:10.1038/s41598-023-45373-z

Alexander Koplenig, Sascha Wolfer

Open Access

https://doi.org/10.1038/s41598-023-45373-z

Copy DOI

Journal: Scientific reports	Publication Date: Oct 28, 2023
Citations: 1	License type: CC BY 4.0

Affiliation: Leibniz Institute for the German Language

Abstract

Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs—ranging from very simple n-gram models to state-of-the-art deep neural networks—on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Languages with more speakers tend to be harder to (machine-)learn

Abstract

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Transformers in the loop: Polarity in neural models of language
Lisa Bylinina ... Alexey Tikhonov
-
Lisa Bylinina, et. al.Lisa Bylinina ... Alexey Tikhonov
01 Jan 2021
01 Jan 2021

Transformers in the loop: Polarity in neural models of language
...
-
, et. al. ...
11 May 2022
11 May 2022

So Cloze Yet So Far: N400 Amplitude Is Better Predicted by Distributional Information Than Human Predictability Judgements
James A Michaelov ... Benjamin K Bergen
IEEE Transactions on Cognitive and Developmental Systems | VOL. 15
James A Michaelov, et. al.James A Michaelov ... Benjamin K Bergen
01 Sep 2023
IEEE Transactions on Cognitive and Developmental Systems | VOL. 15

Navigating the semantic space: Unraveling the structure of meaning in psychosis using different computational language models
Rui He ... Wolfram Hinzen
Psychiatry Research | VOL. 333
Rui He, et. al.Rui He ... Wolfram Hinzen
23 Jan 2024
Psychiatry Research | VOL. 333

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Languages with more speakers tend to be harder to (machine-)learn

Abstract

Talk to us

Similar Papers

More From: Scientific reports