Abstract

The present study aims at building a database of Korean syllable frequencies and distributions as a useful resource that could be consulted by researchers in psycholinguistics and other adjacent disciplines. In doing so, we produced a set of syllable token/type frequency lists by word classes and positions within an eojeol/headword compiled from the Sejong Corpus containing 15 million eojeols of written texts. The important results include the following: Firstly, the power law was observed, which is characterized by the phenomena that most tokens/types are accounted for by a small number of syllables. Secondly, there was a strong tendency that the token/type frequencies of eojeol/headword syllables decrease as a function of their phonological complexity. Lastly, substantial differences in phonological and morphological aspects were found between the first and second syllables of eojeols/headwords. The database containing 26 different syllable frequency lists can be freely shared via the GitHub repository of one of the authors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call