TikTok, a popular short-form video platform, has evolved into a versatile communication tool with potential applications in education. Examining the various multimodal components present in TikTok videos can offer valuable insights for vocabulary acquisition. TikTok combines multiple modes (multimodal), like visuals, text, audio, and movement, to engage users. Understanding the synergy between these modes on TikTok, in conveying meaning and maintaining engagement, could inform more effective vocabulary teaching methods. This paper delves into the examination of two frameworks for analysing multimodal elements in pedagogical materials. The first framework discussed is Serafini's (2010), while the second is Machin and Mayr's (2012). The findings suggest that a key distinction lies in Serafini's narrow focus on pedagogy compared to Machin and Mayr's broader approach, which scrutinises the construction of meaning. The study concludes that Serafini's framework is most appropriate for analysing multimodal elements in pedagogical materials as it directly addresses the learning objectives in student learning. The results of this study hold implications for students, educators, and educational establishments.