The brain is capable of integrating signals from multiple sensory modalities. Such multisensory integration can occur in areas that are commonly considered unisensory, such as planum temporale (PT) representing the auditory association cortex. However, the roles of different afferents (feedforward vs. feedback) to PT in multisensory processing are not well understood. Our study aims to understand that by examining laminar activity patterns in different topographical subfields of human PT under unimodal and multisensory stimuli. To this end, we adopted an advanced mesoscopic (sub-millimeter) fMRI methodology at 7 T by acquiring BOLD (blood-oxygen-level-dependent contrast, which has higher sensitivity) and VAPER (integrated blood volume and perfusion contrast, which has superior laminar specificity) signal concurrently, and performed all analyses in native fMRI space benefiting from an identical acquisition between functional and anatomical images. We found a division of function between visual and auditory processing in PT and distinct feedback mechanisms in different subareas. Specifically, anterior PT was activated more by auditory inputs and received feedback modulation in superficial layers. This feedback depended on task performance and likely arose from top-down influences from higher-order multimodal areas. In contrast, posterior PT was preferentially activated by visual inputs and received visual feedback in both superficial and deep layers, which is likely projected directly from the early visual cortex. Together, these findings provide novel insights into the mechanism of multisensory interaction in human PT at the mesoscopic spatial scale.