Abstract

The discriminative lexicon is introduced as a mathematical and computational model of the mental lexicon. This novel theory is inspired by word and paradigm morphology but operationalizes the concept of proportional analogy using the mathematics of linear algebra. It embraces the discriminative perspective on language, rejecting the idea that words’ meanings are compositional in the sense of Frege and Russell and arguing instead that the relation between form and meaning is fundamentally discriminative. The discriminative lexicon also incorporates the insight from machine learning that end-to-end modeling is much more effective than working with a cascade of models targeting individual subtasks. The computational engine at the heart of the discriminative lexicon is linear discriminative learning: simple linear networks are used for mapping form onto meaning and meaning onto form, without requiring the hierarchies of post-Bloomfieldian ‘hidden’ constructs such as phonemes, morphemes, and stems. We show that this novel model meets the criteria of accuracy (it properly recognizes words and produces words correctly), productivity (the model is remarkably successful in understanding and producing novel complex words), and predictivity (it correctly predicts a wide array of experimental phenomena in lexical processing). The discriminative lexicon does not make use of static representations that are stored in memory and that have to be accessed in comprehension and production. It replaces static representations by states of the cognitive system that arise dynamically as a consequence of external or internal stimuli. The discriminative lexicon brings together visual and auditory comprehension as well as speech production into an integrated dynamic system of coupled linear networks.

Highlights

  • Theories of language and language processing have a long history of taking inspiration from mathematics and computer science

  • As lexicality decisions do not require word identification, further improvement in predicting decision behavior is expected to be possible by considering whether the predicted semantic is closest to the targeted vector and measures such as how densely the space around the predicted semantic vector is populated

  • A series of studies indicates that recognizing isolated words taken out of running speech is a nontrivial task for human listeners [121,122,123]

Read more

Summary

Introduction

Theories of language and language processing have a long history of taking inspiration from mathematics and computer science. A second response is to interpret the units on the hidden layers of deep learning networks as capturing the representations and their hierarchical organization familiar from standard linguistic frameworks. Anticipating discussion of technical details, we implement linear networks (mathematically, linear mappings) that are based entirely on discrimination as learning mechanism and that work with large numbers of features at much lower levels of representation than in current and classical models. This is followed by a brief discussion of how time can be brought into the model (Section 6). We discuss the implications of our results

Background
A Semantic Vector Space Derived from the TASA Corpus
Comprehension
Speech Production
Model Performance
Bringing in Time
General Discussion
Findings
Graph-Based Triphone Sequencing
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call