The Japanese lexicon is typically classified into at least three etymological strata: native, Sino-Japanese and foreign words. In Tokyo Japanese, nouns from different strata are known to have different phonotactic as well as tonotactic properties. Should one analyze Tokyo Japanese nouns using a non-clustering grammar that generates all nouns using the same phonological grammar, or should one analyze them using a clustering grammar that generates nouns from different strata using different grammars? In this study, I address this question from a probabilistic and a model selection perspective: the better probabilistic grammar is one that better balances fit to data and the number of parameters in the grammar. Using the UCLA Phonotactic Learner, I train two kinds of MaxEnt grammars that correspond to non-clustering and clustering grammars. I compare the two kinds of grammar using the Bayesian Information Crierion (BIC), and show that the non-clustering grammars make a better trade-off between fit to data and model size than non-clustering grammars. Consequently, different etymological strata of the Tokyo Japanese nominal lexicon are better analyzed as being generated from different MaxEnt grammars than from the same MaxEnt grammar.