Abstract

YOLOv5 remains one of the most widely used real-time detection models due to its commendable performance in accuracy and generalization. However, compared to more recent detectors, it falls short in label assignment and leaves significant room for optimization. Particularly, recognizing targets with varying shapes and poses proves challenging, and training the detector to grasp such features requires expert verification or collective discussion during the dataset labeling process, especially in domain-specific contexts. While deformable convolutions offer a partial solution, their extensive usage can enhance detection capabilities but at the expense of increased computational effort. We introduce DP-YOLO, an enhanced target detector that efficiently integrates the YOLOv5s backbone network with deformable convolutions. Our approach optimizes the positive sample selection during label assignment, resulting in a more scientifically grounded process. Notably, experiments on the COCO benchmark validate the efficacy of DP-YOLO, which utilizes an image size of [640, 640], achieves a remarkable 41.2 AP, and runs at an impressive 69 fps on an RTX 3090. Comparatively, DP-YOLO outperforms YOLOv5s by 3.2 AP, with only a small increase in parameters and GFLOPSs. These results demonstrate the significant advancements made by our proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.