Abstract
The present study aims at building a database of Korean syllable frequencies and distributions as a useful resource that could be consulted by researchers in psycholinguistics and other adjacent disciplines. In doing so, we produced a set of syllable token/type frequency lists by word classes and positions within an eojeol/headword compiled from the Sejong Corpus containing 15 million eojeols of written texts. The important results include the following: Firstly, the power law was observed, which is characterized by the phenomena that most tokens/types are accounted for by a small number of syllables. Secondly, there was a strong tendency that the token/type frequencies of eojeol/headword syllables decrease as a function of their phonological complexity. Lastly, substantial differences in phonological and morphological aspects were found between the first and second syllables of eojeols/headwords. The database containing 26 different syllable frequency lists can be freely shared via the GitHub repository of one of the authors.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have