Mathematics is an underexplored domain of human cognition. While many studies have focused on subsets of math concepts such as numbers, fractions, or geometric shapes, few have ventured beyond these elementary domains. Here, we attempted to map out the full space of math concepts and to answer two specific questions: can distributed semantic models, such a GloVe, provide a satisfactory fit to human semantic judgements in mathematics? And how does this fit vary with education? We first analyzed all of the French and English Wikipedia pages with math contents, and used a semi-automatic procedure to extract the 1000 most frequent math terms in both languages. In a second step, we collected extensive behavioral judgements of familiarity and semantic similarity between them. About half of the variance in human similarity judgements was explained by vector embeddings that attempt to capture latent semantic structures based on cooccurence statistics. Participants' self-reported level of education modulated familiarity and similarity, allowing us to create a partial hierarchy among high-level math concepts. Our results converge onto the proposal of a map of math space, organized as a database of math terms with information about their frequency, familiarity, grade of acquisition, and entanglement with other concepts.
Read full abstract