AbstractTeaching with virtual worlds provides new means for collaborative learning but creates challenges for teachers in terms of IT skills. To address these challenges, we developed a teaching model for using virtual worlds in classroom practices and applied it to Minecraft in several rounds of design-based research experiments. Our conceptual framework combines ideas from software engineering (sociotechnical congruence) and social sciences (intersubjectivity and emergence). Empirically, we addressed the problem of how shared understanding evolves in computer-mediated learning activities. We video-recorded classroom activities and analyzed them using interaction analysis. The teaching model engaged the students in two interdependent processes, referred to as objects: (1) a social object (discussions) that led to a shared knowledge object (video-recorded role-play) and (2) a technology object (Minecraft buildings) for staging the role-play. Our findings include an empirical phenomenon that we call emergent group understanding, which arose from the complex social interactions between social and technology objects when Minecraft was used as a virtual world in a social studies classroom. This revealed two connected subprocesses: (1) a spontaneous act of providing information to assist learners in contextualizing their actions and interactions against a common background, and (2) setting localized goals to guide future actions and interactions. This finding extends previous research by identifying fine-grained processes of intersubjectivity that contribute to collaborative learning. More generally, our teaching model addresses the problem of balancing creative and instructional learning goals.