This research introduces a novel, highly precise, and learning-free approach to locomotion mode prediction, a technique with potential for broad applications in the field of lower-limb wearable robotics. This study represents the pioneering effort to amalgamate 3D reconstruction and Visual-Inertial Odometry (VIO) into a locomotion mode prediction method, which yields robust prediction performance across diverse subjects and terrains, and resilience against various factors including camera view, walking direction, step size, and disturbances from moving obstacles without the need of parameter adjustments. The proposed Depth-enhanced Visual-Inertial Odometry (D-VIO) has been meticulously designed to operate within computational constraints of wearable configurations while demonstrating resilience against unpredictable human movements and sparse features. Evidence of its effectiveness, both in terms of accuracy and operational time consumption, is substantiated through tests conducted using open-source dataset and closed-loop evaluations. Comprehensive experiments were undertaken to validate its prediction accuracy across various test conditions such as subjects, scenarios, sensor mounting positions, camera views, step sizes, walking directions, and disturbances from moving obstacles. A comprehensive prediction accuracy rate of 99.00% confirms the efficacy, generality, and robustness of the proposed method.