Content and Language Integrated Learning (CLIL) is increasingly adopted globally, including in Taiwan’s educational initiatives, yet challenges remain in implementing effective CLIL practices, such as pedagogy and curriculum design. This study investigated the effectiveness of multimodal task designs, combining hands-on learning with poster presentations, in enhancing oral communicative competence within CLIL contexts. Employing a mixed-methods, quasi-experimental design with a comparative case study framework, the study assessed English oral communicative competence in four intact fourth-grade Taiwanese CLIL Social Studies classes. The hands-on learning group (EG, n = 40) engaged in activities like Chinese Dumpling Making, Bird’s Nest Building, and Succulent Pot Designing, while the non-hands-on learning group (CG, n = 34) used traditional worksheets on the same topics. Both groups proceeded to poster presentations within their multimodal task design, where students’ oral communicative competence was assessed using rubrics developed based on Coyle’s 4Cs dimensions, focusing on Content, Communication, and Cognition. Additionally, students’ cultural knowledge related to the hands-on topics was evaluated through written tests. To complement the quantitative data, qualitative data from self-reported reflections and video recordings documenting interventions were collected for the assessment of oral communicative competence within a CLIL framework. Results demonstrate that integrating hands-on activities significantly enhanced procedural content, communication (i.e., sentence complexity, pronunciation accuracy for target vocabulary, presentation fluency), and cognitive abilities, confirming the efficacy of multimodal learning approaches in fostering linguistic and cognitive engagement. Post-test comparisons show the EG’s superiority in cultural knowledge acquisition across all three hands-on topics. Student reflections endorsed the enrichment of learning experiences through multimodal task design. Video analysis of both groups’ interventions revealed that despite significant engagement and autonomy, EG students commonly utilized general English rather than target vocabulary, a pattern similar to that observed in the CG. These findings highlight the potential of diverse modalities in CLIL to enhance English content learning and oral skills, shaping future pedagogy and language strategies in Taiwan. The study also emphasizes the role of embodied learning, the interplay between physical actions and cognitive processes, to facilitate deeper understanding and engagement with subject matter within CLIL settings.