The development of multimodality has led to increased research on its use to improve students' English language competency. However, no recent review has analyzed multimodality in English language learning in higher education. This systematic review examines 34 research articles published from 2013 to 2024. The primary focus of the study is to explore the application of multimodal pedagogies in higher education, the methods and materials used to assist learners in acquiring English language skills, the English language skills acquired through the usage of multimodality, and the main results of using multiple modes. This systematic review employs the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) standards. It adopts a thorough search strategy across electronic databases, which include Web of Science and Scopus. We found (1) the implementation of multimodality contributes to learners' English language proficiency in English for academic purposes (EAP) and English for specific purposes (ESP) education; (2) there is a predominance of digital multimodality and nonverbal communication use in the higher education classroom, for example, gesture, kinesics, spatial position, facial expression, and gaze the use of the 3D (dimensional) environment, virtual reality (VR); (3) there is an advantage of a multimodal approach in improving higher education learners' vocabulary, reading, speaking, and writing skills and a positive connection between the implementation of multimodality and the development of learners' communicative ability. This systematic review highlights existing research gaps and outlines potential avenues for future investigation aimed at conceptualizing and assessing learners’ skills through multimodal approaches.