Abstract

Encoding models based on deep convolutional neural networks (DCNN) more accurately predict BOLD responses to natural scenes in the visual system than any other currently available model. However, DCNN-based encoding models fail to predict a significant amount of variance in the activity of most voxels in all visual areas. This failure could reflect limitations in the data (e.g., a noise ceiling), or could reflect limitations of the DCNN as a model of computation in the brain. Understanding the source and structure of the unexplained variance could therefore provide helpful clues for improving models of brain computation. Here, we characterize the structure of the variance that DCNN-based encoding models cannot explain. Using a publicly available dataset of BOLD responses to natural scenes, we determined if the source of unexplained variance was shared across voxels, individual brains, retinotopic locations, and hierarchically distant visual brain areas. We answered these questions using voxel-to-voxel (vox2vox) models that predict activity in a target voxel given activity in a population of source voxels. We found that simple linear vox2vox models increased within-subject prediction accuracy over DCNN-based models for any pair of source/target visual areas, clearly demonstrating that the source of unexplained variance is widely shared within and across visual brain areas. However, vox2vox models were not more accurate than DCNN-based models when source and target voxels came from separate brains, demonstrating that the source of unexplained variance was not shared across brains. Furthermore, the weights of these vox2vox models permitted explicit readout of the receptive field location of target voxels, demonstrating that the source of unexplained variance induces correlations primarily between the activities of voxels with overlapping receptive fields. Finally, we found that vox2vox model prediction accuracy was heavily dependent upon the signed hierarchical distance between the source and target voxels: for feed-forward models (source area lower in the visual hierarchy than target area) prediction accuracy decreased with hierarchical distance between source and target. It did not decrease for feedback models. In contrast, the same analysis applied across layers of a DCNN did not reveal this feed-forward/feedback asymmetry. Given these results, we argue that the structured variance unexplained by DCNN-based encoding models is unlikely to be entirely caused by spatially correlated noise or eye movements; rather, our results point to a need for brain models that include endogenous dynamics and a pattern of connectivity that is not strictly feed-forward.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.