Image recognition is vital for intelligent ships’ autonomous navigation. However, traditional methods often fail to accurately identify maritime objects’ spatial positions, especially under electromagnetic silence. We introduce the StereoYOLO method, an enhanced stereo vision-based object recognition and localization approach that serves autonomous vessels using only image sensors. It is specifically refined for maritime object recognition and localization scenarios through the integration of convolutional and coordinated attention modules. The method uses stereo cameras to identify and locate maritime objects in images and calculate their relative positions using stereo vision algorithms. Experimental results indicate that the StereoYOLO algorithm boosts the mean Average Precision at IoU threshold of 0.5 (mAP50) in object recognition by 5.23%. Furthermore, the variation in range measurement due to target angle changes is reduced by 6.12%. Additionally, upon measuring the distance to targets at varying ranges, the algorithm achieves an average positioning error of 5.73%, meeting the accuracy and robustness criteria for maritime object collision avoidance on experimental platform ships.