AbstractCollaborative learning, driven by knowledge co‐construction and meaning negotiation, is a pivotal aspect of educational contexts. While gesture's importance in conveying shared meaning is recognized, its role in collaborative group settings remains understudied. This gap hinders accurate and equitable assessment and instruction, particularly for linguistically diverse students. Advancements in multimodal learning analytics, leveraging sensor technologies, offer innovative solutions for capturing and analysing body movements. This study employs these novel approaches to demonstrate how learners' machine‐detected body movements during the learning process relate to their verbal and nonverbal contributions to the co‐construction of embodied math knowledge. These findings substantiate the feasibility of utilizing learners' machine‐detected body movements as a valid indicator for inferring their engagement with the collaborative knowledge construction process. In addition, we empirically validate that these inferred different levels of learner engagement indeed impact the desired learning outcomes of the intervention. This study contributes to our scientific understanding of multimodal approaches to knowledge expression and assessment in learning, teaching, and collaboration.Practitioner notesWhat is already known about this topic Previous research emphasizes the importance of gestures as essential tools for constructing common ground and reflecting shared meaning‐making in learning and teaching contexts. The prior studies in multimodal learning analytics (MMLA) suggest that certain forms of body movements and postures can be differentiated based on the automatic detection of upper body joint locations. Empirical observations indicate that co‐thought gestures typically involve smaller hand or arms movement that are closer to the gesturer's body than co‐speech gestures used in interpersonal communication. What this paper adds This paper fills the research gap by examining the use of gestures in collaborative learning, offering insights into how individuals contribute verbally and nonverbally to collaborative knowledge construction. This paper introduces the concept of using machine‐detected body movements as a viable proxy for inferring learners' engagement in collaborative knowledge‐building activities. Leverages sensor technologies for automatic detection of body movements, the innovative approach in this work seeks to overcome the time‐intensive and laborious process of manually coding gestures. Implications for practice and/or policy By recognizing the potential significance of learners' body movements in indicating engagement levels with collaborative knowledge‐building activities, instructors can set up computer‐supported collaborative learning (CSCL) environments to enable capturing these movements. Given the crucial role of gestures in learning, teaching, and collaboration, educators can create more equitable formative assessment practices for linguistically diverse students by developing strategies that align with multimodal forms of knowledge expression. Research can expand beyond mathematics to explore the transferability of these findings to other subjects, helping educators create comprehensive pedagogical approaches that leverage multimodal interactions across disciplines.