Abstract English-medium instruction (EMI) potentially offers pedagogical efficiency by simultaneously providing access to academic content and English affordances. EMI’s efficacy and effectiveness, however, remain unproven with questions concerning students’ language proficiency unresolved. As listening to lectures holds a central role in tertiary education and the development of language skills is founded on second-language listening ability, key concerns for stakeholders are knowing how much of an EMI lecture students can understand and what factors impact their lecture listening comprehension. Current methodologies used to capture listening comprehension data, such as tests, surveys, summaries, and transcript markings do not capture the idiosyncratic, volatile, and multifaceted reactions to aural text phenomena that learners encounter during a lecture. This study, therefore, uses an innovative foot switch mechanism to capture comprehension data in real time as students participated in an EMI lecture. These data were then used to guide a stimulated recall. The data showed learners failing to comprehend extensive sections of the lecture while deeper analysis identified sections of the lecture where issues concerning prior taught knowledge, top-down schema building, lexical, and bottom-up identification of sounds coincided for multiple learners. Extrapolating from these points of convergence, pedagogical recommendations are provided.