Detecting small dark targets underwater, such as fishing nets, is critical to the operation of underwater robots. Existing techniques often require more computational resources and operate under harsh underwater imaging conditions when handling such tasks. This study aims to develop a model with low computational resource consumption and high efficiency to improve the detection accuracy of fishing nets for safe and efficient underwater operations. The Light-YOLO model proposed in this paper introduces an attention mechanism based on sparse connectivity and deformable convolution optimized for complex underwater lighting and visual conditions. This novel attention mechanism enhances the detection performance by focusing on the key visual features of fishing nets, while the introduced CoTAttention and SEAM modules further improve the model’s recognition accuracy of fishing nets through deeper feature interactions. The results demonstrate that the proposed Light-YOLO model achieves a precision of 89.3%, a recall of 80.7%, and an mAP@0.5 of 86.7%. Compared to other models, our model has the highest precision for its computational size and is the lightest while maintaining similar accuracy, providing an effective solution for fishing net detection and identification.