In order to address the challenges of identifying, detecting, and tracking moving objects in video surveillance, this paper emphasizes image-based dynamic entity detection. It delves into the complexities of numerous moving objects, dense targets, and intricate backgrounds. Leveraging the You Only Look Once (YOLOv3) algorithm framework, this paper proposes improvements in image segmentation and data filtering to address these challenges. These enhancements form a novel multi-object detection algorithm based on an improved YOLOv3 framework, specifically designed for video applications. Experimental validation demonstrates the feasibility of this algorithm, with success rates exceeding 60% for videos such as "jogging", "subway", "video 1", and "video 2". Notably, the detection success rates for "jogging" and "video 1" consistently surpass 80%, indicating outstanding detection performance. Although the accuracy slightly decreases for "Bolt" and "Walking2", success rates still hover around 70%. Comparative analysis with other algorithms reveals that this method's tracking accuracy surpasses that of particle filters, Discriminative Scale Space Tracker (DSST), and Scale Adaptive Multiple Features (SAMF) algorithms, with an accuracy of 0.822. This indicates superior overall performance in target tracking. Therefore, the improved YOLOv3-based multi-object detection and tracking algorithm demonstrates robust filtering and detection capabilities in noise-resistant experiments, making it highly suitable for various detection tasks in practical applications. It can address inherent limitations such as missed detections, false positives, and imprecise localization. These improvements significantly enhance the efficiency and accuracy of target detection, providing valuable insights for researchers in the field of object detection, tracking, and recognition in video surveillance.