Despite the growing recognition of the importance of multimodal input and digital virtual reality (VR) games in enhancing EFL learners’ productive language skills, a significant gap remains in empirical research examining their impact on multimodal output—particularly writing and speaking—within content and language integrated learning (CLIL) science education. This quasi-experimental study addresses this gap by investigating the potential benefits of using VR games to enhance fourth-grade CLIL students’ productive language skills, specifically writing and speaking, through the analysis of their ability to convey scientific concepts in multimodal output. Grounded in self-regulated learning (SRL) theory, the study compares the effects of multimodal input embedded in VR games with those of traditional PowerPoint (PPT)-led games on students’ English poster designs (writing) and oral presentations (speaking), using the 4Cs (Content, Communication, Cognition, and Culture) framework in multimodal assessments. The study involved 81 fourth-grade students from three Taiwanese public elementary schools, divided into an experimental group (EG = 40) using VR-based games and a control group (CG = 41) using PPT-led games for content review. A mixed-methods approach was employed, combining quantitative evaluations with rubrics based on the 4Cs framework and qualitative rater reflections to provide a comprehensive understanding of how different review methods influenced student performance and creative output. Quantitative findings revealed that students using VR review games significantly outperformed those using traditional PPT games in aspects of Content and Cognition for both poster designs and presentations, demonstrating greater depth, accuracy, and application of scientific concepts and higher-order cognitive skills. In terms of Communication, the EG showed higher target vocabulary usage and sentence complexity in presentations, but no significant differences were found in Culture outcomes between the groups or in Communication in posters. Expert raters’ reflections further highlighted that students using VR games exhibited more innovative and integrated use of scientific content, critical thinking, and multimodal expressions, reflecting deeper engagement with the material. This study empirically demonstrates that game-based virtual reality learning environments (VRLEs) significantly enhance students’ multimodal output in content and cognitive skills. Theoretically, it extends the application of SRL in CLIL contexts by highlighting the potential of VRLEs to foster advanced cognitive skills and emphasizes the importance of multimodal assessments in capturing comprehensive student learning outcomes. Future research should explore integrating cultural content into VR environments to enhance students’ cultural awareness and sensitivity.