ABSTRACT The artificial visual method is currently commonly used to decipher multi-beam water column images to obtain the position and state of lost shipping containers. However, the recognition efficiency and accuracy of this method need to be improved. The You Only Look Once (YOLO) series model has strong real-time target detection capability. Meanwhile, the Segment Anything Model (SAM) has strong zero-shot transferability. A detection and extraction method for lost shipping containers, which combines the two models mentioned above, is proposed in this study. First, the YOLO series model is employed to detect lost shipping container targets in a single-frame water column image. On this basis, the output of bounding box positions by the optimal target detection model is used as a prompt for the SAM. Finally, the SAM is used to extract lost shipping container targets in images through zero-shot transferability. Experimental results in the Pearl River estuary show that the combined modelling method of YOLOv5-n and EdgeSAM-3× achieves the best overall performance. The precision and recall for the detection of lost shipping containers by this method is better than 95%. In terms of target extraction, YOLOv5-n and EdgeSAM-3× have the best Intersection over Union, recall, and F1 scores.