Regularized models of audiovisual integration of speech with predictive power for sparse behavioral data

Tobias S Andersen,Ole Winther

doi:10.1016/j.jmp.2020.102404

Tobias S Andersen, Ole Winther

Open Access

https://doi.org/10.1016/j.jmp.2020.102404

Copy DOI

Abstract

Audiovisual integration can facilitate speech comprehension by integrating information from lip-reading with auditory speech perception. When incongruent acoustic speech is dubbed onto a video of a talking face, this integration can lead to the McGurk illusion of hearing a different phoneme than that spoken by the voice. Several computational models of the information integration process underlying these phenomena exist. All are based on the assumption that the integration process is, in some sense, optimal. They differ, however, in assuming that it is based on either continuous or categorical internal representations. Here we develop models of audiovisual integration of the phonetic information represented on an internal representation that is continuous and cyclical. We compare these models to the Fuzzy Logical Model of Perception (FLMP), which is based on a categorical internal representation. Using cross-validation, we show that model evaluation criteria based on the goodness-of-fit are poor measures of the models’ generalization error even if they take the number of free parameters into account. We also show that the predictive power of all the models benefit from regularization that limits the precision of the internal representation. Finally, we show that, unlike the FLMP, models based on a continuous internal representation have good predictive power when properly regularized.

Full Text