Abstract

The observed pronunciations or spellings of words are often explained as arising from the “underlying forms” of their morphemes. These forms are latent strings that linguists try to reconstruct by hand. We propose to reconstruct them automatically at scale, enabling generalization to new words. Given some surface word types of a concatenative language along with the abstract morpheme sequences that they express, we show how to recover consistent underlying forms for these morphemes, together with the (stochastic) phonology that maps each concatenation of underlying forms to a surface form. Our technique involves loopy belief propagation in a natural directed graphical model whose variables are unknown strings and whose conditional distributions are encoded as finite-state machines with trainable weights. We define training and evaluation paradigms for the task of surface word prediction, and report results on subsets of 7 languages.

Highlights

  • How is plurality expressed in English? Comparing cats ([kæts]), dogs ([dOgz]), and quizzes ([kwIzIz]), the plural morpheme has at least three pronunciations ([s], [z], [Iz]) and at least two spellings (-s and -es)

  • The apparent regularity of natural-language phonology was first observed by Johnson (1972), so computational phonology has generally preferred grammar formalisms that compile into finite-state machines, whether the formalism is based on rewrite rules (Kaplan and Kay, 1994) or constraints (Eisner, 2002a; Riggle, 2004)

  • We objectively evaluate our learner on its ability to predict held-out surface forms

Read more

Summary

Introduction

Comparing cats ([kæts]), dogs ([dOgz]), and quizzes ([kwIzIz]), the plural morpheme has at least three pronunciations ([s], [z], [Iz]) and at least two spellings (-s and -es). Generative linguists traditionally posit that each morpheme of a language has a single representation shared across all contexts (Jakobson, 1948; Kenstowicz and Kisseberth, 1979, chapter 6). This string is a latent variable that is never observed. Variation appears when the phonology of the language maps these underlying representations (URs)—in context—to surface representations (SRs) that may be easier to pronounce. The phonology is usually described by a grammar that may consist of either rewrite rules (Chomsky and Halle, 1968) or ranked constraints (Prince and Smolensky, 2004)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.