We would like to thank all commentators for their insightful and thought-provoking commentaries. We find itgratifying that the commentators represent a diverse array of expertise, as they have enriched the discussion with theirdifferent perspectives on this topic. The commentaries have also identified and highlighted important open questionsin this field; an important step in advancing the field. Here, we discuss some of the important theoretical issues andobservations raised by de Haas and Rees [1], Spence [2], Alais [3], Vroomen [4], and Barone [5].de Haas and Rees [1] underscore that crossmodal interactions appear to occur at multiple levels of processing, andargue that any theory of multisensory perception should account for this phenomenon. We agree with this assessment.As discussed in the target paper [6], crossmodal interactions have been reported for a variety of visual tasks rangingfrom low-level perceptual tasks such as detection and motion perception, to high-level tasks such as object recognition.Even for a simple low-level task, there appear to be interactions between modalities at multiple levels of processing.For example, as noted by de Haas and Rees [1], in the numerosity judgment task (discussed in Sections 3 and 4 ofthe target paper), fMRI revealed interactions in areas ranging from superior colliculus, V1, and STS [7,8], and MEGrevealed early-onset interactions in occipital regions followed by later interactions in parietal and frontal areas [9].Interactions at multiple levels of processing are indeed consistent with a Bayesian inference scheme in which bothlikelihoods (sensory representations) and priors (expectations) play a role in the perceptual process. While the combi-nation of sensory information (likelihood functions) can occur at an early neural level of processing (e.g., V1, superiorcolliculus, or thalamus), the priors may involve interactions at a range of neural processing levels. For example, in