Abstract Multimodal (inter)action analysis offers a powerful and robust methodology for the study of action and interaction between social actors, their environment, and the objects and tools within. Yet its implementation in the analysis of synchronous multimodal online data sets, e.g. (inter)actions via videoconferencing, is limited. Drawing on our research in understanding teacher-learner (inter)actions in instruction-giving fragments in synchronous multimodal online language lessons, we describe and illustrate the ways in which we adapted and extended some of the methodological and analytical tools. These include (1) the use of a grounded theory approach in delineating and identifying higher-level actions, (2) the embodiment and disembodiment of frozen actions, (3) electronic print mode, (4) semiotic lag, (5) semiotic (mis)alignment, (6) modal density (mis)alignment, and (7) how modal density can be achieved by brisk modal shifts in addition to through modal intensity and complexity. We conclude by a call for further educational research in online teaching platforms using the framework to have richer understandings of the (inter)actions between social actors with particular roles and identities (teachers-learners), their environment, and the objects and tools within, which bring their “own material properties, feel and techniques of use, affordances and limitations” (Chun, Dorothy, Richard Kern & Bryan Smith. 2016. Technology in language use, language teaching, and language learning. The Modern Language Journal 100. 64–80: 65).