Structurally-constrained encoding framework using a multi-voxel reduced-rank latent model for human natural vision.

Amin Ranjbar,Amir Abolfazl Suratgar,Mohammad Bagher Menhaj,Reza Abbasi Asl

doi:10.1088/1741-2552/ad6184

Abstract

Voxel-wise visual encoding models based on convolutional neural networks (CNNs) have emerged as one of the prominent predictive tools of human brain activity via functional magnetic resonance imaging (fMRI) signals. While CNN-based models imitate the hierarchical structure of the human visual cortex to generate explainable features in response to natural visual stimuli, there is still a need for a brain-inspired model to predict brain responses accurately based on biomedical data. To bridge this gap, we propose a response prediction module called the Structurally Constrained Multi-Output (SCMO) module to include homologous correlations that arise between a group of voxels in a cortical region and predict more accurate responses. This module employs all the responses across a visual area to predict individual voxel-wise BOLD responses and therefore accounts for the population activity and collective behavior of voxels. Such a module can determine the relationships within each visual region by creating a structure matrix that represents the underlying voxel-to-voxel interactions. Moreover, since each response module in visual encoding tasks relies on the image features, we conducted experiments using two different feature extraction modules to assess the predictive performance of our proposed module. Specifically, we employed a recurrent CNN that integrates both feedforward and recurrent interactions, as well as the popular AlexNet model that utilizes feedforward connections. In the end, we demonstrate that the proposed framework provides a reliable predictive ability to generate brain responses across multiple areas, outperforming benchmark models in terms of stability and coherency of features.

Full Text