Abstract

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Methods Appendix 1 Appendix 2 Appendix 3 Data availability References Decision letter Author response Article and author information Metrics Abstract Understanding object representations requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. Here, we present THINGS-data, a multimodal collection of large-scale neuroimaging and behavioral datasets in humans, comprising densely sampled functional MRI and magnetoencephalographic recordings, as well as 4.70 million similarity judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly annotated objects, allowing for testing countless hypotheses at scale while assessing the reproducibility of previous findings. Beyond the unique insights promised by each individual dataset, the multimodality of THINGS-data allows combining datasets for a much broader view into object processing than previously possible. Our analyses demonstrate the high quality of the datasets and provide five examples of hypothesis-driven and data-driven applications. THINGS-data constitutes the core public release of the THINGS initiative (https://things-initiative.org) for bridging the gap between disciplines and the advancement of cognitive neuroscience. Editor's evaluation Hebart et al., present a landmark, multimodal massive dataset to support the study of visual object representation, including data measured from functional magnetic resonance imaging, magnetoencephalography, and behavioral similarity judgments. The compelling, condition-rich design, conducted over a thoughtfully curated and sampled set of object concepts will be highly valuable to the cognitive/computational/neuroscience community, yielding data that will be amenable to many empirical questions beyond the field of visual object recognition. The dataset is accompanied by quality control evaluations, as well as examples of analyses that the community can re-run and further explore for building new hypotheses that can be tested with such a rich dataset. https://doi.org/10.7554/eLife.82580.sa0 Decision letter Reviews on Sciety eLife's review process Introduction A central goal of cognitive neuroscience is to attain a detailed characterization of the recognition and understanding of objects in the world. Over the past few decades, there has been tremendous progress in revealing the basic building blocks of human visual and semantic object processing. For example, numerous functionally selective clusters have been identified in ventral and lateral occipitotemporal cortex that respond selectively to images of faces, scenes, objects, or body parts (Downing et al., 2001; Epstein and Kanwisher, 1998; Kanwisher et al., 1997; Malach et al., 1995). Likewise, several coarse-scale gradients have been revealed that span across these functionally selective regions and that reflect low-level visual properties such as eccentricity or curvature (Arcaro et al., 2015; Groen et al., 2022; Yue et al., 2020), mid-to-high-level properties such as animacy or size (Caramazza and Shelton, 1998; Konkle and Caramazza, 2013; Konkle and Oliva, 2012; Kriegeskorte et al., 2008b), or high-level semantics (Huth et al., 2012). These results have been complemented by studies in the temporal domain, revealing a temporal cascade of object-related responses that become increasingly invariant over time to visually specific features such as size and position (Isik et al., 2014), that reflect differences between visual and more abstract semantic properties (Bankson et al., 2018; Cichy et al., 2014; Clarke et al., 2013; Clarke et al., 2015), and that reveal the dynamics of feedforward and feedback processing (Boring et al., 2022; Kietzmann et al., 2019; Mohsenzadeh et al., 2018). These spatial and temporal patterns of object-related brain activity have been linked to categorization behavior (Grootswagers et al., 2018; Ritchie et al., 2015) and perceived similarity (Bankson et al., 2018; Cichy et al., 2019; Mur et al., 2013), indicating their direct relevance for overt behavior. Despite these advances, our general understanding of the processing of visually-presented objects has remained incomplete. One major limitation stems from the enormous variability of the visual world and the thousands of objects that we can identify and distinguish (Biederman, 1985; Hebart et al., 2019). Different objects are characterized by a large and often correlated set of features (Groen et al., 2017; Naselaris et al., 2021), making it challenging to determine the overarching properties that govern the representational structure in visual cortex and behavior. A more complete understanding of visual and semantic object processing will almost certainly require a high-dimensional account (Naselaris et al., 2021; Haxby et al., 2011; Hebart et al., 2020; Lehky et al., 2014), which is impossible to derive from traditional experiments that are based only on a small number of stimuli or a small number of categories. Likewise, even large-scale datasets remain limited in the insights they can yield about object representations when they lack a systematic sampling of object categories and images. To overcome these limitations, here we introduce THINGS-data, which consists of three multimodal large-scale datasets of brain and behavioral responses to naturalistic object images. There are three key aspects of THINGS-data that maximize its utility and set it apart from other large-scale datasets using naturalistic images (Allen et al., 2022; Chang et al., 2019; Horikawa and Kamitani, 2017; Kay et al., 2008). First, THINGS-data is unique in that it offers a broad, comprehensive and systematic sampling of object representations for up to 1854 diverse nameable manmade and natural object concepts. This is in contrast to previous large-scale neuroimaging datasets that focused primarily on dataset size, not sampling, and that often contain biases towards specific object categories (Allen et al., 2022; Chang et al., 2019). Second, THINGS-data is multimodal, containing functional MRI, magnetoencephalography (MEG) and behavioral datasets allowing analyses of both the spatial patterns and temporal dynamics of brain responses (Ghuman and Martin, 2019) as well as their relationship to behavior. In particular, THINGS-data comes with 4.70 million behavioral responses that capture the perceived similarity between objects with considerable detail and precision. Third, the THINGS database of object concepts and images (Hebart et al., 2019) comes with a growing body of rich annotations and metadata, allowing for direct comparisons of representations across domains, an extension to other methods and species (Kriegeskorte et al., 2008a), streamlined incorporation of computational modeling frameworks (Kriegeskorte and Douglas, 2018), and direct testing of diverse hypotheses on these large-scale datasets. In this paper, we provide a detailed account of all aspects of THINGS-data, from acquisition and data quality checks to exemplary analyses demonstrating the potential utility of the data. These exemplary analyses primarily serve to highlight potential research directions that could be explored with these data. In addition, the analyses of the neuroimaging data reveal high reliability of findings across individual participants, underscoring the utility of densely sampling a small number of individuals. Finally, they replicate a large number of research findings, suggesting that these data can be used for revealing new insights into visual and semantic processing in human brain and behavior. We expect that THINGS-data will serve as an important resource for the community, enabling novel analyses to provide significant insights into visual object processing as well as validation and extension of existing findings. THINGS-data reflects the core release of datasets as part of the THINGS initiative (https://things-initiative.org), which will provide numerous multimodal and multispecies behavioral, neurophysiology, and neuroimaging datasets based on the same images, offering an important general resource that bridges the gap between disciplines for the advancement of the cognitive neurosciences. Results A multimodal collection of datasets of object representations in brain and behavior We collected three datasets that extensively sampled object representations using functional MRI (fMRI), magnetoencephalography (MEG), and behavior (Figure 1). To this end, we drew on the THINGS database (Hebart et al., 2019), a richly-annotated database of 1854 object concepts representative of the American English language which contains 26,107 manually curated naturalistic object images. The comprehensive set of object categories, the large number of high-quality naturalistic images, and the rich set of semantic and image annotations make THINGS ideally suited for the large-scale collection of imaging and behavioral datasets. Figure 1 with 1 supplement see all Download asset Open asset Overview over datasets. (A) THINGS-data comprises MEG, fMRI and behavioral responses to large samples of object images taken from the THINGS database. (B) In the fMRI and MEG experiment, participants viewed object images while performing an oddball detection task (synthetic image). (C) The behavioral dataset comprises human similarity judgements from an odd-one-out task where participants chose the most dissimilar object amongst three options. (D) The fMRI dataset contains extensive additional imaging data. (E) The MEG dataset provides high temporal resolution of neural response measurements in 272 channels. The butterfly plot shows the mean stimulus-locked response in each channel for four example sessions in one of the participants. During the fMRI and MEG experiments, participants were shown a representative subset of THINGS images, spread across 12 separate sessions (fMRI: N=3, 8740 unique images of 720 objects; MEG: N=4, 22,448 unique images of 1854 objects). Images were shown in fast succession (fMRI: 4.5 s; MEG: 1.5±0.2 s; Figure 1B), and participants were instructed to maintain central fixation. Please note that for the MEG and fMRI experiments, we chose non-overlapping sets of participants to ensure they had not seen individual images before and thus to minimize potential memory effects on measured object representations. To ensure engagement, participants performed an oddball detection task responding to occasional artificially-generated images. A subset of images (fMRI: n=100; MEG: n=200) were shown repeatedly in each session to estimate noise ceilings (Lage-Castellanos et al., 2019) and to provide a test set for model evaluation (see Appendix 1 for details on the concept and image selection strategy). Beyond the core functional imaging data in response to THINGS images, additional structural and functional imaging data as well as eye-tracking and physiological responses were gathered. Specifically, for MEG, we acquired T1-weighted MRI scans to allow for cortical source localization. Eye movements were monitored in the MEG to ensure participants maintained central fixation (see Appendix 2 and Appendix 2—figure 1 for extensive eye-movement related analyses). For MRI, we collected high-resolution anatomical images (T1- and T2-weighted), measures of brain vasculature (Time-of-Flight angiography, T2*-weighted), and gradient-echo field maps. In addition, we ran a functional localizer to identify numerous functionally specific brain regions, a retinotopic localizer for estimating population receptive fields, and an additional run without external stimulation for estimating resting-state functional connectivity. Finally, each MRI session was accompanied by physiological recordings (heartbeat and respiration) to support data denoising. Based on these additional data, we computed a variety of data derivatives for users to refine their analyses. These derivatives include cortical flatmaps which allow for visualizing statistical results on the entire cortical surface (Gao et al., 2015), independent-component based noise regressors which can be used for improving the reliability of obtained results, regions of interest for category-selective and early visual brain areas which allow for anatomically-constrained research questions, and estimates of retinotopic parameters, such as population receptive field location and size. THINGS-data also includes 4.70 million human similarity judgements collected via online crowdsourcing for 1854 object images. In a triplet odd-one-out task, participants (N=12,340) were presented with three objects from the THINGS database and were asked to indicate which object is the most dissimilar. The triplet odd-one-out task assesses the similarity of two objects in the context imposed by a third object. With a broad set of objects, this offers a principled approach for measuring context-independent perceived similarity with minimal response bias, but also allows for estimating context-dependent similarity, for example by constraining similarity to specific superordinate categories, such as animals or vehicles. An initial subset of 1.46 million of these odd-one-out judgments were reported in previous work (Hebart et al., 2020; Zheng et al., 2019), and the 4.70 million trials reported here represent a substantial increase in dataset size and the ability to draw inferences about fine-grained similarity judgments. Beyond dataset size, two notable additions are included. First, we collected age information, providing a cross-sectional sample for how mental representations may change with age. Second, we collected a set of 37,000 within-subject triplets to estimate variability at the subject level. Taken together, the behavioral dataset provides a massive set of perceived similarity judgements of object images and can be linked to neural responses measured in MEG and fMRI, opening the door to studying the neural processes underlying perceived similarity at scale, for a wide range of objects. The remaining results section will be structured as follows: We will first describe the quality and reliability of both neuroimaging datasets, followed by the description of the quality of the behavioral dataset. Then, we will showcase the validity and suitability of the datasets for studying questions about behavioral and neural object representations. This will include multivariate pairwise decoding of hundreds of object categories, encoding analyses serving as a large-scale replication of the animacy and size organization in occipitotemporal cortex, representational similarity analysis of patterns of brain activity and perceived similarity, and a novel MEG-fMRI fusion approach based on directly regressing MEG responses onto fMRI voxel activation patterns. Data quality and data reliability in the fMRI and MEG datasets To be useful for addressing diverse research questions, we aimed at providing neuroimaging datasets with excellent data quality and high reliability. To reduce variability introduced through head motion and alignment between sessions, fMRI and MEG participants wore custom head casts throughout all sessions. Figure 2 demonstrates that overall head motion was, indeed, very low in both neuroimaging datasets. In the fMRI dataset, the mean framewise displacement per run was consistently below 0.2 mm. In the MEG, head position was recorded between runs and showed consistently low head motion for all participants during sessions (median <1.5 mm). Between sessions, changes in MEG head position were slightly higher but remained overall low (median <3 mm). A visual comparison of the evoked responses for each participant across sessions in different sensor groups highlights that the extent of head motion we observed does not appear to be detrimental for data quality (see Figure 2—figure supplement 1). Figure 2 with 5 supplements see all Download asset Open asset Quality metrics for fMRI and MEG datasets. fMRI participants are labeled F1-F3 and MEG participants M1-M4 respectively. (A) Head motion in the fMRI experiment as measured by the mean framewise displacement in each functional run of each participant. (B) Median change in average MEG head coil position as a function of the Euclidean distance of all pairwise comparisons between all runs. Results are reported separately for comparisons within sessions and between sessions (see Figure 2—figure supplement 4 for all pairwise distances). (C) fMRI voxel-wise noise ceilings in the test dataset as an estimate of explainable variance visualized on the flattened cortical surface. The labeled outlines show early visual (V1–V3) and category-selective brain regions identified based on the population receptive field mapping and localizer data, respectively. (D) MEG time-resolved noise ceilings similarly show high reliability, especially for occipital, parietal, and temporal sensors. To further improve fMRI data quality and provide easily usable data, we conducted two additional processing steps. First, since fMRI data contains diverse sources of noise including head motion, pulse, heartbeat, and other sources of physiological and scanner-related noise, we developed a custom denoising method based on independent component analysis (Beckmann and Smith, 2004), which involved hand-labeling a subset of components and a set of simple heuristics to separate signal from noise components (see Methods for details). This approach yielded strong and consistent improvements in the reliability of single trial BOLD response estimates (Figure 1—figure supplement 1). Second, we estimated the BOLD response amplitude to each object image by fitting a single-trial regularized general linear model on the preprocessed fMRI time series with voxel-specific estimates of the HRF shape (see Methods). Together, these methods yielded much higher data reliability and provided a format that is much smaller than the original time series and that is amenable to a wider range of analysis techniques, including data-driven analyses. This reduced set of BOLD parameter estimates is used for all analyses showcased in this manuscript and is part of the publicly available data (see Data availability). To provide a quantitative assessment of the reliability of the fMRI and MEG datasets, we computed noise ceilings. Noise ceilings are defined as the maximum performance any model can achieve given the noise in the data (Lage-Castellanos et al., 2019) and are based on the variability across repeated measurements. Since noise ceiling estimates depend on the number of trials averaged in a given analysis, we estimated them separately for the 12 trial repeats of the test set and for single trial estimates. Noise ceilings in the test set were high (Figure 2), with up to 80% explainable variance in early visual cortex for fMRI (Figure 2C) and up to 70% explainable variance in MEG (Figure 2D, Figure 2—figure supplement 2). Individual differences between participants indicated that performance was particularly high for fMRI participants F1 and F2 and MEG participants M2 and M3 but qualitatively similar for all participants. For single-trial estimates, as expected, noise ceilings were lower and varied more strongly across participants (Figure 2—figure supplement 3). This suggests that these neuroimaging datasets are ideally suited for analyses that incorporate estimates across multiple trials, such as encoding or decoding models or data-driven analyses at the level of object concepts. Data quality and data reliability in the behavioral odd-one out dataset: A 66-dimensional embedding captures fine-grained perceived similarity judgments To achieve a full estimate of a behavioral similarity matrix for all 1854 objects, we would have to collect 1.06 billion triplet odd-one-out judgments. We previously demonstrated (Hebart et al., 2020) that 1.46 million trials were sufficient to generate a sparse positive similarity embedding (SPoSE) (Zheng et al., 2019) that approached noise ceiling in predicting choices in left-out trials and pairwise similarity. SPoSE yielded 49 interpretable behavioral dimensions reflecting perceptual and conceptual object properties (e.g. colorful, animal-related) and thus identified what information may be used by humans to judge the similarity of objects in this task. Yet, several important questions about the general utility of these data could not be addressed with this original dataset. First, how much data is enough to capture the core dimensions underlying human similarity judgments? Previously, we had shown that performance of our embedding at predicting triplet choices had saturated even with the original 1.46 million trials, yet dimensionality continued to increase with dataset size (Hebart et al., 2020). Before collecting additional data and using different subsets of the original dataset, we estimated that model dimensionality would saturate around 67.5 dimensions and would reach ~66.5 dimensions for 4.5–5 million trials (Figure 3A). Indeed, when re-running the model with the full dataset of 4.70 million trials (4.10 million for training), embedding dimensionality turned out as predicted: from a set of 72 randomly-initialized models, we chose the most reliable embedding as the final embedding, revealing 66 interpretable dimensions underlying perceived similarity judgments (see Methods for details). Thus, increasing dataset size beyond this large dataset may no longer yield noticeable improvements in predictive performance or changes in embedding dimensionality at the global level of similarity, and potential improvements may not justify the cost of collecting additional data. Thus, rather than continuing to increase dataset size, future research on representational object dimensions may focus more strongly on individual differences, within category similarity, different sensory domains, or abstracted stimuli. Figure 3 with 1 supplement see all Download asset Open asset Behavioral similarity dataset. (A) How much data is required to capture the core representational dimensions underlying human similarity judgments? Based on the original dataset of 1.46 million triplets (Hebart et al., 2020), it was predicted that around 4.5–5 million triplets would be required for the curve to saturate. Indeed, for the full dataset, the dimensionality was found to be 66, in line with the extrapolation. Red bars indicate histograms for dimensionality across several random model initializations, while the final model was chosen to be the most stable among this set. (B) Within-category pairwise similarity ratings were predicted better for diverse datasets using the new, larger dataset of 4.70 million triplets (4.10 million training samples), indicating that this dataset contains more fine-grained similarity information. Error bars reflect standard errors of the mean. In the final 66-dimensional embedding, many dimensions were qualitatively very similar to the original 49 dimensions (Figure 3—figure supplement 1), and some new dimensions were splits derived from previously mixed dimensions (e.g. plant-related and green) or highlighted more fine-grained aspects of previous dimensions (e.g. dessert rather than food). Overall model performance was similar yet slightly lower for the new and larger as compared to the original and smaller dataset (original: 64.60 ± 0.23%, new: 64.13 ± 0.18%), while noise ceilings were comparable (original noise ceiling dataset: 68.91 ± 1.07%, new noise ceiling datasets: 68.74 ± 1.07% and 67.67 ± 1.08%), indicating that the larger dataset was of similar quality. However, these noise ceilings were based on between-subject variability, leaving open a second question: how strongly did within-subject variability contribute to overall variability in the data? To estimate the within-subject noise ceiling, we inspected the consistency of within-subject triplet repeats. The within-subject noise ceiling was at 86.34 ± 0.46%. Even though this estimate constitutes an upper bound of the noise ceiling, since identical trials were repeated after only 16–20 triplets to compute reliability estimates, these results indicate that a lot of additional variance may be captured when accounting for differences between individuals. Thus, participant-specific modeling based on this new large-scale behavioral dataset may yield additional, novel insights into the nature of mental object representations. Third, while increases in dataset size did not lead to notable improvements in overall performance, did increasing the dataset size improve more fine-grained predictions of similarity? To address this question, we used several existing datasets of within-category similarity ratings (Avery et al., 2022; Iordan et al., 2022; Peterson et al., 2018) and computed similarity predictions. Rather than computing similarity across all possible triplets, these predictions were constrained to triplet contexts within superordinate categories (e.g. animals, vehicles). We expected the overall predictive performance to vary, given that these existing similarity rating datasets were based on a different similarity task or used different images. Yet, improvements are expected if fine-grained similarity can be estimated better with the large dataset than the original dataset. Indeed, as shown in Figure 3B, seven out of eight datasets showed an improvement in predicted within-category similarity (mean improvement M=0.041 ± 0.007, p<0.001, bootstrap difference test). This demonstrates that within-category similarity could be estimated more reliably with the larger dataset, indicating that the estimated embedding indeed contained more fine-grained information. Robustly decodable neural representations of objects Having demonstrated the quality and overall reliability of the neuroimaging datasets, we aimed at validating their general suitability for studying questions about the neural representation of objects. To this end, we performed multivariate decoding on both the fMRI and MEG datasets, both at the level of individual object images, using the repeated image sets, and at the level of object category, using the 12 example images per category. Demonstrating the possibility to decode image identity and object category thus serves as a baseline analysis for more specific future research analyses. When decoding the identity of object images, for fMRI we found above chance decoding accuracies in all participants throughout large parts of early visual and occipitotemporal cortices (Figure 4A), with peak accuracies in early visual cortex, reaching 80% in participants F1 and F2. In MEG, we found above-chance decoding within an extended time-window (~80–1000ms) peaking ~100ms after stimulus onset, approaching 70–80% in participants M2 and M3 (Figure 4B). Figure 4 Download asset Open asset Object image decoding in fMRI and MEG. (A) Decoding accuracies in the fMRI data from a searchlight-based pairwise classification analysis visualized on the cortical surface. (B) Analogous decoding accuracies in the MEG data plotted over time. The arrow marks the onset of the largest time window where accuracies exceed the threshold which was defined as the maximum decoding accuracy observed during the baseline period. Moving from the level of decoding of individual images to the decoding of object category, for fMRI, accuracies now peaked in high-level visual cortex (Figure 5A). Likewise, for MEG the early decoding accuracies were less pronounced in absolute magnitude as compared to object image decoding (Figure 5C & D). Together, these results confirm that both object image and object category can be read out reliably from both neuroimaging datasets, demonstrating their general usefulness for addressing more specific research questions about object identity. Figure 5 Download asset Open asset Object category decoding and multidimensional scaling of object categories in fMRI and MEG. (A) Decoding accuracies in the fMRI data from a searchlight-based pairwise classification analysis visualized on the cortical surface. (B) Multidimensional scaling of fMRI response patterns in occipito-temporal category-selective brain regions for each individual subject. Each data point reflects the average response pattern of a given object category. Colors reflect superordinate categories. (C) Pairwise decoding accuracies of object category resolved over time in MEG for each individual subject. (D) Group average of subject-wise MEG decoding accuracies. Error bars reflect standard error of the mean across participants (n = 4). (E) Multidimensional scaling for the group-level response pattern at different timepoints. Colors reflect superordinate categories and highlight that differential responses can emerge at different stages of processing. To demonstrate the utility of the datasets for exploring the representational structure in the neural response patterns evoked by different object categories, we additionally visualized their relationships in a data-driven fashion using multidimensional scaling (MDS) and highlighted clusters formed by superordinate categories. In fMRI, spatial response patterns across voxels in object-selective brain areas formed distinct clusters for the superordinate categories animals vs. food (Figure 5B). MEG sensor patterns showed differences between categorical clustering at early and late time points (e.g. early differences for vehicles vs. tools, late differences for animals vs. food), indicating that information about superordinate categories arise at different times (Figure 5E). Large-scale replication of experimental findings: The case of animacy and si

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call