Abstract

Vision science, particularly machine vision, has been revolutionized by introducing large-scale image datasets and statistical learning approaches. Yet, human neuroimaging studies of visual perception still rely on small numbers of images (around 100) due to time-constrained experimental procedures. To apply statistical learning approaches that include neuroscience, the number of images used in neuroimaging must be significantly increased. We present BOLD5000, a human functional MRI (fMRI) study that includes almost 5,000 distinct images depicting real-world scenes. Beyond dramatically increasing image dataset size relative to prior fMRI studies, BOLD5000 also accounts for image diversity, overlapping with standard computer vision datasets by incorporating images from the Scene UNderstanding (SUN), Common Objects in Context (COCO), and ImageNet datasets. The scale and diversity of these image datasets, combined with a slow event-related fMRI design, enables fine-grained exploration into the neural representation of a wide range of visual features, categories, and semantics. Concurrently, BOLD5000 brings us closer to realizing Marr’s dream of a singular vision science–the intertwined study of biological and computer vision.

Highlights

  • Background & SummaryBoth human and computer vision share the goal of analyzing visual inputs to accomplish high-level tasks such as object and scene recognition[1]

  • The motivation to scale up neural datasets is to leverage and combine the wide variety of technological advances that have enabled significant, parallel progress in both biological and machine vision

  • In an attempt to understand complex neural activity arising from advanced neuroimaging techniques, high-performing computer vision systems have been touted as effective potential models of neural computation[1]

Read more

Summary

Background & Summary

Both human and computer vision share the goal of analyzing visual inputs to accomplish high-level tasks such as object and scene recognition[1]. In an attempt to understand complex neural activity arising from advanced neuroimaging techniques, high-performing computer vision systems have been touted as effective potential models of neural computation[1] This is primarily for three reasons: (1) the origin of these models is linked to the architecture of the primate visual system[4]; (2) these models learn from millions of real-world images; (3) these models achieve high-performance in diverse tasks such as scene recognition, object recognition, segmentation, detection, and action recognition– tasks defined and grounded in human judgments of correctness. One of the most significant outstanding challenges for integrating across fields is data[13] We address this data challenge with the BOLD5000 dataset, a large-scale, slow event-related human fMRI study incorporating 5,000 real-world images as stimuli. We hope that BOLD5000 engenders greater collaboration between the two fields of vision science, fulfilling Marr’s dream

Methods
Findings
Code Availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.