Reading the written language environment: Learning orthographic structure from statistical regularities

Teresa Marie Schubert,Simon Fischer-Baum,Trevor Cohen

doi:10.1016/j.jml.2020.104148

Teresa Marie Schubert, Simon Fischer-Baum + Show 1 more

Open Access

https://doi.org/10.1016/j.jml.2020.104148

Copy DOI

Abstract

Abstract Statistical regularities in the environment impact cognition across domains. In semantics, distributional approaches posit that similarity between words can be derived from regularities of the contexts in which they appear. Here, we study how regularities in written text impact readers’ knowledge about orthography: Can similarity between characters be learned from the written environment? Adapting methods from distributional semantics, we model the contextual similarity among alphanumeric characters in a large text corpus. We find modest correlations between model-derived similarities with similarity derived from a behavioral experiment. Beyond this result, model-derived similarity from neural embedding models captures key aspects of orthographic knowledge, like case, letter identity and consonant–vowel status. We conclude that the text environment contains regularities that are relevant to readers and that statistical learning is a promising way for this information to be acquired. More broadly, our results imply that statistical regularities are relevant not only at the level of word semantics but also individual written characters.

Full Text