BackgroundBenign paroxysmal positional vertigo (BPPV) is a prevalent form of vertigo that necessitates a skilled physician to diagnose by observing the nystagmus and vertigo resulting from specific changes in the patient’s position. In this study, we aim to explore the integration of eye movement video and position information for BPPV diagnosis and apply artificial intelligence (AI) methods to improve the accuracy of BPPV diagnosis.MethodsWe collected eye movement video and diagnostic data from 518 patients with BPPV who visited the hospital for examination from January to March 2021 and developed a BPPV dataset. Based on the characteristics of the dataset, we propose a multimodal deep learning diagnostic model, which combines a video understanding model, self-encoder, and cross-attention mechanism structure.ResultOur validation test on the test set showed that the average accuracy of the model reached 81.7%, demonstrating the effectiveness of the proposed multimodal deep learning method for BPPV diagnosis. Furthermore, our study highlights the significance of combining head position information and eye movement information in BPPV diagnosis. We also found that postural and eye movement information plays a critical role in the diagnosis of BPPV, as demonstrated by exploring the necessity of postural information for the diagnostic model and the contribution of cross-attention mechanisms to the fusion of postural and oculomotor information. Our results underscore the potential of AI-based methods for improving the accuracy of BPPV diagnosis and the importance of considering both postural and oculomotor information in BPPV diagnosis.