Blast lung injury (BLI) is a significant concern in cases of explosive events and can have severe consequences for victims. The injury is often irreversible, emphasizing the need for early symptom confirmation and intervention. Tele-ultrasound (Tele-US) is a novel approach that enables healthcare professionals to remotely interpret US images and provide diagnostic assessments, which can offer timely and accurate assessments for BLI. In this study, we propose an explainable two-stream tele-US diagnosis method for BLI, which is an incremental multimodal multistage fusion method (IM3S-Fusion). This method is the first deep learning-based diagnosis method for BLI and, specifically, the first exploration of video Transformers in this field. It integrates optical flow into US videos, enhancing model interpretability and improving diagnostic accuracy. By integrating incremental broad learning system (IBLS), IM3S-Fusion can incrementally update its parameters with new samples in a lifelong learning fashion, especially for those misdiagnosed samples. The results show its superiority in diagnosing BLI with (87.50 ± 2.76)%, (89.06 ± 2.71)%, (88.09 ± 5.26)%, (85.71 ± 7.14)% and (88.43 ± 2.24)% in accuracy, recall, precision, specificity and F1-score, respectively. Explanatory assistance including fused US video with optical flow, class activation map videos and confidence scores, is provided for doctors to facilitate their diagnosis.