Abstract Traditional vision-based inertial odometry (VIO) suffers from significant visual degradation, which substantially impacts state estimation in challenging lighting environments. Thermal imaging cameras capture images based on the thermal radiation of objects, rendering them impervious to lighting variations. However, integrating thermal infrared data into conventional visual odometry poses challenges due to its low texture, poor contrast, and high noise levels. In this paper, we propose a tightly coupled approach that seamlessly integrates information from visible light cameras, thermal imaging cameras, and inertial measurement units (IMUs). First, we employ adaptive bilateral filtering and Sobel gradient enhancement to smooth infrared images, thereby reducing noise and enhancing edge contrast. Second, we leverage the Sage-Husa adaptive filter in conjunction with iterative Kalman filtering (IEKF) to effectively mitigate the impact of non-Gaussian noise on the system. Finally, we conduct comprehensive evaluations of the proposed system using both open datasets and real-world experiments across four distinct scenarios: normal lighting, low-light conditions, low-light conditions with camera shake, and challenging lighting environments. Comparative analysis reveals that our method outperforms IEKF, yielding a reduction in localization error measured by root mean square error (RMSE) by 58.69%, 57.24%, 60.23%, and 30.87% in these respective scenarios.