Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization

Haiguang Wen,Junxing Shi,Wei Chen,Zhongming Liu

doi:10.1038/s41598-018-22160-9

Abstract

The brain represents visual objects with topographic cortical patterns. To address how distributed visual representations enable object categorization, we established predictive encoding models based on a deep residual network, and trained them to predict cortical responses to natural movies. Using this predictive model, we mapped human cortical representations to 64,000 visual objects from 80 categories with high throughput and accuracy. Such representations covered both the ventral and dorsal pathways, reflected multiple levels of object features, and preserved semantic relationships between categories. In the entire visual cortex, object representations were organized into three clusters of categories: biological objects, non-biological objects, and background scenes. In a finer scale specific to each cluster, object representations revealed sub-clusters for further categorization. Such hierarchical clustering of category representations was mostly contributed by cortical representations of object features from middle to high levels. In summary, this study demonstrates a useful computational strategy to characterize the cortical organization and representations of visual features for rapid categorization.

Highlights

Advances in deep learning[21] have established a range of deep neural networks (DNN) inspired by the brain itself[4,22]
This study demonstrates a high-throughput computational strategy to characterize hierarchical, distributed, and overlapping cortical representations of visual objects and categories
Results suggest that information about visual-object category entails multiple levels and domains of features represented by distributed cortical patterns in both ventral and dorsal pathways

Summary

Introduction

Advances in deep learning[21] have established a range of deep neural networks (DNN) inspired by the brain itself[4,22]. Extending from recent studies[23,24,25,26,27,29], we used a deep residual network (ResNet)[37] to define, train, and test a generalizable, predictive, and hierarchical model of natural vision by using extensive fMRI data from humans watching >10 hours of natural videos. Taking this predictive model as a “virtual” fMRI scanner, we synthesized the cortical response patterns with 64,000 natural pictures including objects from 80 categories, and mapped cortical representations of these categories with high-throughput. Consistent but complementary to prior experimental studies[12,15,16,32,38,39,40,41,42,43], this study used a model-based computational strategy to study how cortical representations of various levels of object knowledge sub-serve categorization

Methods

Results

Conclusion