Abstract

Online object detection is a fundamental problem in time-critical video analysis applications. Due to the performance limitations of one-stage object detection algorithms in dense pedestrian occlusion, we have improved YOLO-V4 in this paper, including network structure optimization, more efficient multi-scale feature fusion strategy formulation, and more specialized network loss function design. First, a single-output YOLO-V4 network structure is proposed, which integrates image information from multiple scales through the designed ladder fusion strategy. This not only ensures that the aspect ratio estimation of anchors is still driven by training data but also solves the invalid anchor distribution problem of the original network for objects with approximate size. Second, we adjust the resolution ratio of the network output feature map to the original input image to reduce the label rewriting cases of training samples. Finally, the concept of repulsive force is introduced to optimize the bounding box regression loss function, which improves the robustness of the model to the detection of densely occluded pedestrians, and enhances the practical value of YOLO-V4 in actual application scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call