ObjectivesFreehand three-dimensional (3D) ultrasound (US) is of great significance for clinical diagnosis and treatment, it is often achieved with the aid of external devices (optical and/or electromagnetic, etc.) that monitor the location and orientation of the US probe. However, this external monitoring is often impacted by imaging environment such as optical occlusions and/or electromagnetic (EM) interference. MethodsTo address the above issues, we integrated a binocular camera and an inertial measurement unit (IMU) on a US probe. Subsequently, we built a tight coupling model utilizing the unscented Kalman algorithm based on Lie groups (UKF-LG), combining vision and inertial information to infer the probe's movement, through which the position and orientation of the US image frame are calculated. Finally, the volume data was reconstructed with the voxel-based hole-filling method. ResultsThe experiments including calibration experiments, tracking performance evaluation, phantom scans, and real scenarios scans have been conducted. The results show that the proposed system achieved the accumulated frame position error of 3.78 mm and the orientation error of 0.36° and reconstructed 3D US images with high quality in both phantom and real scenarios. ConclusionsThe proposed method has been demonstrated to enhance the robustness and effectiveness of freehand 3D US. Follow-up research will focus on improving the accuracy and stability of multi-sensor fusion to make the system more practical in clinical environments.