The minimal, essential condition for individuals to interact is that they exchange information via at least one sensory channel. Once informational coupling is established, it enables basic forms of coordinated behavior to spontaneously emerge from the interaction. Our previous study revealed different coordination dynamics in dyads engaged in a joint finger-tapping task based on visual versus auditory coupling. This observation led us to propose the ‘modality-dependent hypothesis', which posits that coordination dynamics are influenced by the sensory modality mediating informational coupling. However, recognizing that different modalities have inherent differences in accessing spatiotemporal features of perceived movement, we formulated the alternative ‘kinematic hypothesis'. This hypothesis posits that differences in dynamics would vanish given equivalent kinematic information across modalities. The study involved forty (N = 40) participants, grouped into twenty (N = 20) dyads, who engaged in a joint finger-tapping task. This task was conducted under varying conditions of visual and auditory coupling, with manipulations in the access to kinematic information, categorized as discrete and continuous. Contrary to our initial predictions, the results strongly supported the ‘modality-dependent hypothesis'. We observed that visual and auditory coupling consistently yielded distinct attractor dynamics, regardless of the access to kinematic information. Furthermore, all conditions of auditory coupling resulted in higher levels of synchronization than their visual counterparts. These findings suggest that the differences in interpersonal synchronization are predominantly influenced by the sensory modality, rather than the continuity of kinematic information. Our study highlights the significance of sensorimotor interactions in interpersonal synchronization and addresses the potential of sonification strategies in supporting motor training and rehabilitation.