Audio-visual media possesses a remarkable ability to synchronise audiences’ neural, behavioural, and physiological responses. This synchronisation is considered to reflect some dimension of collective attention or engagement with the stimulus. But what is it about these stimuli that drives such strong engagement? There are several properties of media stimuli which may lead to synchronous audience response: from low-level audio-visual features, to the story itself. Here, we present a study which separates low-level features from narrative by presenting participants with the same content but in separate modalities. In this way, the presentations shared no low-level features, but participants experienced the same narrative. We show that synchrony in participants’ heart rate can be driven by the narrative information alone. We computed both visual and auditory perceptual saliency for the content and found that narrative was approximately 10 times as predictive of heart rate as low-level saliency, but that low-level audio-visual saliency has a small additive effect towards heart rate. Further, heart rate synchrony was related to a separate cohorts’ continuous ratings of immersion, and that synchrony is likely to be higher at moments of increased narrative importance. Our findings demonstrate that high-level narrative dominates in the alignment of physiology across viewers.