How our brain encodes complex concepts has been a longstanding mystery in neuroscience. The answer to this problem can lead to new understandings about how the brain retrieves information in large-scale data with high efficiency and robustness. Neuroscience studies suggest the brain represents concepts in a locality-sensitive hashing (LSH) strategy, i.e., similar concepts will be represented by similar responses. This finding has inspired the design of similarity-based algorithms, especially in contrastive learning. Here, we hypothesize that the brain and large neural network models, both using similarity-based learning rules, could contain a similar semantic embedding space. To verify that, this paper proposes a functional Magnetic Resonance Imaging (fMRI) semantic learning network named BrainSem, aimed at seeking a joint semantic latent space that bridges the brain and a Contrastive Language-Image Pre-training (CLIP) model. Given that our perception is inherently cross-modal, we introduce a fuzzy (one-to-many) matching loss function to encourage the models to extract high-level semantic components from neural signals. Our results claimed that using only a small set of fMRI recordings for semantic space alignment, we could obtain shared embedding valid for unseen categories out of the training set, which provided potential evidence for the semantic representation similarity between the brain and large neural networks. In a zero-shot classification task, our BrainSem achieves an 11.6% improvement over the state-of-the-art.
Read full abstract