Abstract

Full text Figures and data Side by side Abstract Editor's evaluation Introduction Results Discussion Methods Data availability References Decision letter Author response Article and author information Metrics Abstract Divisive normalization of the neural responses by the activity of the neighboring neurons has been proposed as a fundamental operation in the nervous system based on its success in predicting neural responses recorded in primate electrophysiology studies. Nevertheless, experimental evidence for the existence of this operation in the human brain is still scant. Here, using functional MRI, we examined the role of normalization across the visual hierarchy in the human visual cortex. Using stimuli form the two categories of human bodies and houses, we presented objects in isolation or in clutter and asked participants to attend or ignore the stimuli. Focusing on the primary visual area V1, the object-selective regions LO and pFs, the body-selective region EBA, and the scene-selective region PPA, we first modeled single-voxel responses using a weighted sum, a weighted average, and a normalization model and demonstrated that although the weighted sum and weighted average models also made acceptable predictions in some conditions, the response to multiple stimuli could generally be better described by a model that takes normalization into account. We then determined the observed effects of attention on cortical responses and demonstrated that these effects were predicted by the normalization model, but not by the weighted sum or the weighted average models. Our results thus provide evidence that the normalization model can predict responses to objects across shifts of visual attention, suggesting the role of normalization as a fundamental operation in the human brain. Editor's evaluation This study on object-based attention furthers of understanding of the role of normalization across the visual hierarchy in the human visual cortex. The authors provide solid functional MRI evidence that supports their claims, demonstrating that the normalization model predicts the observed effect when participants selectively attend to one of two stimulus categories. The paper is an important contribution to the fields of perceptual and cognitive neuroscience. https://doi.org/10.7554/eLife.75726.sa0 Decision letter Reviews on Sciety eLife's review process Introduction The brain makes use of fundamental operations to perform neural computations in various modalities and different regions. Divisive normalization has been proposed as one of these fundamental operations. Under this computation, the response of a neuron is determined based on its excitatory input divided by a factor representing the activity of a pool of nearby neurons (Heeger, 1992; Carandini et al., 1997; Carandini and Heeger, 2011). Normalization was first introduced based on responses in the cat primary visual cortex (Heeger, 1992), and evidence of its operation in higher regions of the monkey visual cortex has also been demonstrated both during passive viewing (Bao and Tsao, 2018) and when attention is directed towards a stimulus (Reynolds and Heeger, 2009; Lee and Maunsell, 2010; Ni et al., 2012; Ni and Maunsell, 2019). Normalization has also been proposed as a critical operation in the human brain based on evidence demonstrating the sublinear addition of responses to multiple stimuli in the visual cortex (Bloem and Ling, 2019). Nevertheless, in lieu of directly testing the normalization model to resolve multiple-stimulus representation, several previous studies have shown that a weighted average model can account for multiple-stimulus responses in the monkey brain (Zoccolan et al., 2005; Macevoy and Epstein, 2009; Reddy et al., 2009; Kliger and Yovel, 2020). The only exception is a recent electrophysiology study, which showed that in the category-selective regions of the monkey brain, a winner-take-all, but not averaging, rule can explain neural responses in many cases (Bao and Tsao, 2018). Bao and Tsao, 2018 further demonstrated that the normalization model predicts such winner-take-all behavior. It is not clear whether this discrepancy has emerged as a result of different explored regions of the brain, or due to the diversity in stimuli or the task performed by the participants. In addition to regional computations for multiple-stimulus representation, the visual cortex relies on top-down mechanisms such as attention to select the most relevant stimulus for detailed processing (Moran and Desimone, 1985; Desimone and Duncan, 1995; Chun et al., 2011; Baluch and Itti, 2011; Noudoost et al., 2010; Maunsell, 2015; Thiele and Bellgrove, 2018; Itthipuripat et al., 2014; Moore and Zirnsak, 2017; Buschman and Kastner, 2015). Attention works through increasing the response gain (Treue and Martínez Trujillo, 1999; McAdams and Maunsell, 1999) or contrast gain (Reynolds et al., 2000; Martínez-Trujillo and Treue, 2002) of the attended stimulus. Previous studies have demonstrated how the normalization computation accounts for these observed effects of attention in the monkey brain. They have suggested that normalization attenuates the neural response in proportion to the activity of the neighboring neuronal pool (Reynolds and Heeger, 2009; Ni et al., 2012; Boynton, 2009; Lee et al., 2009). These studies have focused on space-based (Reynolds and Heeger, 2009; Ni et al., 2012; Lee et al., 2009) or feature-based (Ni and Maunsell, 2019) attention. While it has been suggested that these different forms of attention affect neural responses in similar ways, there exist distinctions in their reported effects, such as different time courses (Hayden and Gallant, 2005) and the extent to which they affect different locations in the visual field (Serences and Boynton, 2007; Womelsdorf et al., 2006), suggesting that there are common sources as well as differences in modulation mechanisms between these forms of attention (Ni and Maunsell, 2019). This leaves open the question of whether normalization can explain the effects of object-based attention. In the human visual cortex, normalization has been speculated to underlie response modulations in the presence of attention, with evidence provided both by behavioral studies of space-based (Herrmann et al., 2010) and feature-based (Herrmann et al., 2012; Schwedhelm et al., 2016) attention, as well as neuroimaging studies of feature-based attention (Bloem and Ling, 2019). Although previous studies have qualitatively suggested the role of normalization in the human visual cortex (Bloem and Ling, 2019; Kliger and Yovel, 2020; Itthipuripat et al., 2014; Zhang et al., 2016), evidence for directly testing the validity of the normalization model in predicting human cortical responses in a quantitative way remains scarce. A few studies have demonstrated the quantitative advantage of normalization-based models compared to linear models in predicting human fMRI responses using gratings, noise patterns, and single objects (Kay et al., 2013a; Kay et al., 2013b), as well as moving checkerboards (Aqil et al., 2021; Foster and Ling, 2021). However, whether normalization can also be used to predict cortical responses to multiple objects, and if and to what extent it can explain the modulations in response caused by attention to objects in the human brain remain unanswered. To fill this gap and to explore the discrepancies reported about multiple-stimulus responses, here, we aimed to evaluate the predictions of the normalization model against observed responses to visual objects in several regions of the human brain in the presence and absence of attention. In an fMRI experiment using conditions with isolated and cluttered stimuli and recording the response with or without attention, we provide a comprehensive account of normalization in different regions of the visual cortex, showing its success in adjusting the gain related to each stimulus when it is attended or ignored. We also demonstrate that normalization is closer to average in the absence of attention, as previously reported by several studies (Zoccolan et al., 2005; Macevoy and Epstein, 2009; Kliger and Yovel, 2020), but that the results of the weighted average model and the normalization model diverge to a greater extent in the presence of attention. Our work in the human brain, along with previous studies of normalization in the monkey and human brain, suggests the role of normalization as a canonical computation in the primate brain. Results Attention modulates responses to isolated and paired stimuli In a blocked-design fMRI paradigm, human participants (N = 19) viewed semi-transparent gray-scale stimuli from the two categories of houses and human bodies (Figure 1a). Each experimental run consisted of one-stimulus (isolated) and two-stimulus (paired) blocks, with attention directed either to an object stimulus or to the color of the fixation point. There was an additional fixation color block in each run with no object stimuli, in which the participants were asked to attend to the fixation point color. The experiment, therefore, had a total number of eight conditions (four isolated, three paired, and one fixation conditions, see Figure 1c). In paired blocks, we superimposed the two stimuli to minimize the effect of spatial attention and force participants to use object-based attention (Figure 1b and c). Participants were asked to perform a one-back repetition detection task on the attended object, or a color change detection task on the fixation point (Figure 1b, see Methods for details). Independent localizer runs were used to localize the primary visual cortex (V1), the object-selective regions in the lateral occipital cortex (LO) and posterior fusiform gyrus (pFs), the extrastriate body area (EBA), and the parahippocampal place area (PPA) for each participant (Figure 1d). Figure 1 with 1 supplement see all Download asset Open asset Stimuli, paradigm, and regions of interest. (a) The two stimulus categories (body and house), with the ten exemplars of the body category. (b) Experimental paradigm including the timing of the trials and the inter-stimulus interval. In the example block depicted on the left, both stimulus categories were presented, and the participant was cued to attend to the house category. The two stimuli were superimposed in each trial, and the participant had to respond when the same stimulus from the house category appeared in two successive trials. The color of the fixation point randomly changed in some trials from red to orange, but the participants were asked to ignore the color change. The example block depicted on the right illustrates the condition in which stimuli were ignored and participants were asked to attend to the fixation point color, and respond when they detected a color change. Subjects were highly accurate in performing these tasks (see Figure 1—figure supplement 1). (c) The eight task conditions in each experimental run. For illustration purposes, we have shown the attended category in each block with orange outlines. The outlines were not present in the actual experiment. (d) Regions of interest for an example participant, including the primary visual cortex V1, the object-selective regions LO and pFs, the body-selective region EBA, and the scene-selective region PPA. Each task condition was named based on the presented stimuli and the target of attention, with B and H denoting the presence of body and house stimuli, respectively, and the superscript at denoting the target of attention. Therefore, the seven task conditions include Bat, BatH, BHat, Hat, B, H, and BH. For instance, the Hat condition refers to the isolated house condition with attention directed to house stimuli, and the BH condition refers to the paired condition with attention directed to the fixation point color. Overall, the average accuracy was higher than 86% in all conditions. Averaged across participants, accuracy was 94%, 89%, 86%, 93%, 94%, 96%, 95%, and 96% for Bat, BatH, BHat, Hat, B, H, and BH conditions and the fixation block with no stimulus, respectively. A one-way ANOVA test across conditions showed a significant effect of condition on accuracy (F(7,126)=8.24,p<0.0001) and reaction time (F(7,126)=22.57,p<0.0001). As expected, post-hoc t-tests showed that this was due to lower performance in the BatH and BHat conditions (see Figure 1—figure supplement 1). There was no significant difference in performance between any other conditions (ps>0.07,corrected). To examine the cortical response in different task conditions, we fit a general linear model and estimated the regression coefficients for each voxel in each condition. Figure 2 illustrates the average voxel coefficients for different conditions in the five regions of interest (ROIs), including V1, LO, pFs, EBA, and PPA. Note that we have not included the responses related to the fixation block with no stimulus since this condition was only used to select the voxels that were responsive to the presented stimuli in each ROI (see Methods). We observed that the average voxel coefficients related to the four conditions in which attention was directed to the body or the house stimuli (the first four conditions, Bat, BatH, BHat, Hat) were generally higher than the response related to the last three conditions (B, H, and BH conditions) in which the body and house stimuli were unattended (ts>4,ps<0.01,corrected). This is in agreement with previous research indicating that attention to objects increases their cortical responses (Reddy et al., 2009; Roelfsema et al., 1998; O’Craven et al., 1999). Figure 2 with 1 supplement see all Download asset Open asset Average fMRI regression coefficients and voxel preference for the two categories in all regions of interest (ROIs). (a–e) Average fMRI regression coefficients for each condition are illustrated in the five ROIs. Each condition’s label denotes the presented stimuli and the target of attention, with B and H, respectively, denoting the presence of body and house stimuli and the superscript a⁢t denoting the target of attention. Therefore, the seven task conditions include Bat, BatH, BHat, Hat, B, H, and BH. For instance, the Hat condition refers to the isolated house condition with attention directed to houses, and the BH condition refers to the paired condition with attention directed to the fixation point color. Error bars represent standard errors of the mean for each condition, calculated across participants after removing the overall between-subject variance. N = 19 human participants. (f) The ratio of voxels preferring bodies and houses in each ROI. Both the regression coefficients and the voxel preference ratios were consistent across odd and even runs (see Figure 2—figure supplement 1 and Figure 2—figure supplement 1). Looking more closely at the results in the regions EBA and PPA that have strong preferences for body and house stimuli, respectively, it seems that the effect of attention interacts with the regions’ preference. For instance, in the body-selective region EBA, the response to attended body stimuli in isolation is similar to the response to attended body stimuli paired with unattended house stimuli (compare Bat and BatH bars). On the other hand, the response to attended house stimuli in the isolated condition is significantly less than the response to attended house stimuli paired with unattended body stimuli. We can observe similar results in PPA, but not in V1 or the object-selective regions LO and pFs. But note that the latter three regions do not have strong a preference for one stimulus versus the other. Therefore, in order to examine the interaction between attention and preference more closely, we determined preferences at the voxel level in all ROIs. We defined the preferred (P) and null (N) stimulus categories for each voxel in each ROI according to the voxel’s response to isolated body and isolated house conditions. Figure 2f shows the percentage of voxels in each region that were selective to bodies and houses averaged across participants. As illustrated in the figure, in the object-selective regions LO and pFs, almost half of the voxels were selective to each category, while in the EBA and PPA regions, the general preference of the region prevailed (Even though these regions were selected based on their preference, the noise in the fMRI data and other variations due to imperfect registration led to some voxels showing different preferences in the main session compared to the localizer session Peelen and Downing, 2005). After determining voxel preferences, we rearranged the seven task conditions according to each voxel’s preference. The conditions are hereafter referred to as: Pat, PatN, PNat, Nat, P, PN, N, with P and N denoting the presence of the preferred and null stimuli, respectively, and the superscript a⁢t denoting the attended category. Mean voxel responses in the five ROIs for all task conditions are illustrated by navy lines in Figure 3a–e. Note that although the seven conditions constitute a discrete and not a continuous variable, we have connected the responses in attended conditions (in which body or house stimuli were attended) and unattended conditions (in which body and house were ignored and the fixation point color was attended) separately. This was done for visual purposes and ease of understanding. Figure 3 with 2 supplements see all Download asset Open asset Divisive normalization explains voxel responses in different stimulus conditions. (a–e) Average fMRI responses and model predictions in the five regions of interest. Navy lines represent average responses. Light blue, gray, and orange lines show the predictions of the weighted sum, the weighted average, and the normalization models, respectively. The x-axis labels represent the 7 task conditions, Pat, PatN, PNat, Nat, P, PN, N, with P and N denoting the presence of the preferred and null stimuli and the superscript a⁢t denoting the attended category. For instance, P refers to the condition in which the unattended preferred stimulus was presented in isolation, and PatN refers to the paired condition with the attended preferred and unattended null stimuli. Error bars represent standard errors of the mean for each condition, calculated across participants after removing the overall between-subject variance. N = 19 human participants. (f) Mean explained variance, averaged over voxels in each region of interest for the 5 conditions predicted by the three models. Light blue, gray, and orange bars show the average variance explained by the weighted sum, the weighted average, and normalization models, respectively. Error bars represent the standard errors of the mean. N = 19 human participants. Dashed lines above each set of bars indicate the noise ceiling in each ROI, with the light blue shaded area representing the standard errors of the mean calculated across participants (see Figure 3—figure supplement 1 for an example illustration of how the goodness of fit was calculated for each voxel). As observed in the figure, the normalization model was a better fit for the data compared to the weighted sum (ps < 0.02) and the weighted average (ps < 0.0001) models. Simulation results demonstrate that this superiority is not related to the higher number of parameters or the nonlinearity of the normalization model (see Figure 3—figure supplement 2). We observed that the mean voxel response was generally higher when each stimulus was attended compared to the condition in which it was ignored. For instance, the response in the Pat condition (in which the isolated preferred stimulus was attended) was higher than in the P condition (where the isolated preferred stimulus was ignored) in LO, pFs, and PPA (ts>3.6,ps<0.01,corrected), marginally higher in EBA (t(18)=2.69,p=0.07,corrected), and not significantly higher in V1 (t⁢(18)=2.52,p=0.1,c⁢o⁢r⁢r⁢e⁢c⁢t⁢e⁢d). Similarly, comparing the N and Nat conditions in each ROI, we observed an increase in response caused by attention in all ROIs (ts>4,ps<0.01,corrected) except for V1 (t⁢(18)=2.4,p=0.13,c⁢o⁢r⁢r⁢e⁢c⁢t⁢e⁢d). A similar trend of response enhancement due to attention could also be observed in the paired conditions: attending to either stimulus increased the response in all ROIs (ts>4.4,ps<0.01,corrected) except for V1 (ts<2.59,ps>0.08,corrected). In all cases, the effect of attention was absent or only marginally significant in V1, which is not surprising since attentional effects are much weaker (McAdams and Maunsell, 1999) or even absent (Luck et al., 1997) in V1 compared to the higher-level regions of the occipito-temporal cortex. Next, we asked whether we could predict these response variations and attentional modulations caused by the change in the presented stimuli and the target of attention using three different models. Divisive normalization explains voxel responses in different stimulus conditions We used the three models of weighted sum, weighted average, and normalization to predict voxel responses in different task conditions. Based on the weighted sum model, the response to multiple stimuli is determined by the sum of the responses to each individual stimulus presented in isolation, and attention to each stimulus increases the part of the response associated with the attended stimulus. For instance, in the presence of a null and a preferred stimulus with attention to the preferred stimulus, the response can be determined by RPa⁢t,N=β⁢RP+RN, with RPa⁢t,N , RP, and RN, denoting the response elicited by both stimuli with attention directed to the preferred stimulus, the response to the isolated preferred stimulus, and the response to the isolated null stimulus, respectively. β is the attention-related parameter. According to the weighted average model, the response to multiple stimuli is determined by the average of isolated-stimulus responses, and weighted by the parameter related to attention. Therefore, with an attended preferred and an ignored null stimulus, the response can be written as: RPa⁢t,N=β⁢RP+RN2. Finally, based on the normalization model, the response to a stimulus is determined based on the excitation due to that stimulus and the suppression due to the neighboring neuronal pool. Therefore, the response to an attended preferred and an ignored null stimulus is determined by: RPa⁢t,N=β⁢cP⁢LP+cN⁢LNβ⁢cP+cN+σ, where LP and LN respectively denote the excitation caused by the preferred and the null stimulus, and σ represents the semi-saturation constant. cP and cN are the respective contrasts of the preferred and null stimuli. Zero contrast for a stimulus denotes that the stimulus is not present in the visual field. In our experiment, we set contrast values to one when a stimulus was presented, and to zero when the stimulus was not presented (see Methods for detailed descriptions of models). Although many studies have demonstrated that responses to multiple stimuli are added sublinearly in the visual cortex (Heeger, 1992; Bloem and Ling, 2019; Reddy et al., 2009; Aqil et al., 2021), it has been suggested that for weak stimuli, response summation can approach a linear or even a supralinear regime (Rubin et al., 2015; Heuer and Britten, 2002). Since the stimuli we used in this experiment were presented in a semi-transparent form and were therefore not in full contrast, we found it probable that the response might be closer to a linear summation regime in some cases. We thus used the weighted sum model to examine whether the response approaches linear summation in any region. To compare the three models in their ability to predict the data, we split the fMRI data into two halves (odd and even runs) and estimated the model parameters separately for each voxel of each participant twice: once using the first half of the data, and a second time using the second half of the data. All comparisons of data with model predictions were made using the left-out half of the data in each case. All model results illustrate the average of these two cross-validated predictions. Note that this independent prediction is critical since the numbers of parameters in the three models are different. Possible over-fitting in the normalization model with more parameters will not affect the independent predictions (Kay et al., 2013b). The predictions of the three models for the five modeled task conditions are illustrated in Figure 3a–e (the two isolated ignored conditions P and N were excluded as they were used by the weighted sum and the weighted average models to predict responses in the remaining five conditions, see Methods). As evident in the figure, the predictions of the normalization model (in orange) are generally better than the predictions of the weighted sum and the weighted average models (light blue and gray, respectively) in all regions. To quantify this observation, we calculated the goodness of fit for each voxel by taking the square of the correlation coefficient between the predicted model response and the respective fMRI responses across the five modeled conditions (Figure 3—figure supplement 1). We also calculated the noise ceiling in each region separately as the r-squared of the correlation between the odd and even halves of the data. Given that the correlation between the model and the data cannot exceed the reliability of the data (as calculated by the correlation between the data from odd and even runs), the r-squared can also not exceed the squared split-half reliability. The noise ceiling (squared split-half reliability), therefore, determines the highest possible goodness of fit a model can reach. The results are illustrated in Figure 3f. We first compared the goodness of fit of the three models across the five ROIs using a 3×5 repeated measures ANOVA. The results showed a significant main effect of model (F(2,36)=72.9,p<0.0001) and ROI (F(4,72)=26.66,p<0.0001), and a significant model by ROI interaction (F(8,144)=24.96,p<0.0001). On closer inspection, the normalization model was a better fit to the data than both the weighted sum (ps<0.02,corrected) and the weighted average (ps<0.0001,corrected) models in all ROIs. Since the normalization model had more parameters, we also used the AIC measure to correct for the difference in the number of parameters. The normalization model was a better fit according to the AIC measure as well (see Supplementary file 2). It is noteworthy that while the weighted average model performed better than the weighted sum model in LO and EBA (ps<0.002,corrected), it was not significantly better in pFs and PPA (ps>0.37,corrected), and worse than the weighted sum model in V1 (p<0.0001,corrected). We then calculated the normalization model’s r-squared difference from the noise ceiling (NRD) for each ROI (Equation 7). NRD is a measure of the ability of the model in accounting for the explainable variation in the data; the lower the difference between the noise ceiling and a model’s goodness of fit, the more successful that model is in predicting the data. We ran a one-way ANOVA to test for the effect of ROI on NRD, and observed that this measure was not significantly different across ROIs (F⁢(4,72)=0.58,p=0.61), demonstrating that the normalization model was equally successful across ROIs in predicting the explainable variation in the data. Interestingly, just focusing on the paired condition in which none of the stimuli were attended (the PN condition), the results of the weighted average model were closer to normalization (the gray and orange isolated data points on the subplots a-e of Figure 3 are similarly close to the navy point of data in some regions). For this condition, the predictions of the normalization model were significantly closer to the data compared to the predictions of the weighted average model in V1, pFs, and PPA (ps<0.03,corrected) but not significantly closer to the data in LO and EBA (ps>0.09,corrected). These results are in agreement with previous studies suggesting that the weighted average model provides good predictions of neural and voxel responses in the absence of attention (Zoccolan et al., 2005; Macevoy and Epstein, 2009; Kliger and Yovel, 2020). However, when considering all the attended and unattended conditions, our results show that the normalization model is a generally better fit across all ROIs. To ensure that the superiority of the normalization model over the weighted sum and weighted average models were not caused by the normalization model’s nonlinearity or its higher number of parameters, we ran simulations of three neural populations. Neurons in each population calculated responses to multiple stimuli and attended stimuli by a summing, an averaging, and a normalizing rule (see Methods). We then used the three models to predict the population responses. Our simulation results demonstrate that despite the higher number of parameters, the normalization model is only a better fit for the population of normalizing neurons and not for summing or averaging neurons, as illustrated in Figure 3—figure supplement 2. These results confirm that the better fits of the normalization model cannot be related to the model’s nonlinearity or its higher number of parameters. Normalization accounts for the change in response with the shift of attention Next, comparing the responses in different conditions, we observed two features in the data. First, for the paired conditions, shifting attention from the preferred to the null stimulus caused a reduction in voxel responses. We calculated this reduction in response for each voxel by (Pa⁢t⁢N-P⁢Na⁢t) (Figure 4a, top panel). This response change was significantly greater than zero in all ROIs (ts>6.2, ps<0.0001,corrected) except V1 (t⁢(18)=0.66 , p=0.97,c⁢o⁢r⁢r⁢e⁢c⁢t⁢e⁢d). Beca

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call