Abstract

The Japanese lexicon is typically classified into at least three etymological strata: native, Sino-Japanese and foreign words. In Tokyo Japanese, nouns from different strata are known to have different phonotactic as well as tonotactic properties. Should one analyze Tokyo Japanese nouns using a non-clustering grammar that generates all nouns using the same phonological grammar, or should one analyze them using a clustering grammar that generates nouns from different strata using different grammars? In this study, I address this question from a probabilistic and a model selection perspective: the better probabilistic grammar is one that better balances fit to data and the number of parameters in the grammar. Using the UCLA Phonotactic Learner, I train two kinds of MaxEnt grammars that correspond to non-clustering and clustering grammars. I compare the two kinds of grammar using the Bayesian Information Crierion (BIC), and show that the non-clustering grammars make a better trade-off between fit to data and model size than non-clustering grammars. Consequently, different etymological strata of the Tokyo Japanese nominal lexicon are better analyzed as being generated from different MaxEnt grammars than from the same MaxEnt grammar.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.