Vessel-mounted cranes operate in complex marine environments, where precise measurement of cargo positions and attitudes is a key technological challenge to ensure operational stability and safety. This study introduces an integrated measurement system that combines vision and inertial sensing technologies, utilizing a stereo camera and two inertial measurement units (IMUs) to capture cargo motion in five degrees of freedom (DOF). By merging data from the stereo camera and IMUs, the system accurately determines the cargo’s position and attitude relative to the camera. The specific methodology is introduced as follows: First, the YOLO model is adopted to identify targets in the image and generate bounding boxes. Then, using the principle of binocular disparity, the depth within the bounding box is calculated to determine the target’s three-dimensional position in the camera coordinate system. Simultaneously, the IMU measures the attitude of the cargo, and a Kalman filter is applied to fuse the data from the two sensors. Experimental results indicate that the system’s measurement errors in the x, y, and z directions are less than 2.58%, 3.35%, and 3.37%, respectively, while errors in the roll and pitch directions are 3.87% and 5.02%. These results demonstrate that the designed measurement system effectively provides the necessary motion information in 5-DOF for vessel-mounted crane control, offering new approaches for pose detection of marine cranes and cargoes.