Abstract

At present, human body moving target detection and recognition algorithms based on deep learning have made breakthrough progress. However, in some applications with high real-time requirements, the existing deep learning real-time detection and recognition network is difficult to achieve high detection accuracy. Therefore, how to achieve accurate positioning and recognition of human moving targets while ensuring real-time detection is still an urgent problem in this field. Based on the single shot multi-box detector (SSD) real-time detection network, this paper proposes a real-time detection positioning and recognition network based on multi-scale feature fusion (IMFF-SSD), which improves the positioning accuracy and identification accuracy. First, this article analyzes the multi-scale features extracted from the SSD network. It combines the position-sensitive information provided by low-level detail features with the context information provided by high-level semantic features through feature fusion, which effectively improves positioning accuracy of the target prediction layer in the SSD network. Secondly, a feature embedded prediction structure is designed to strengthen the semantics of target features without changing the spatial resolution of the SSD prediction layer, and embed low-scale detailed features in high-semantic features for collaborative prediction of targets. This improves the accuracy of the SSD network's recognition of human moving targets at all scales. The experimental results show that by combining the above two improvements, the real-time monitoring and recognition network based on multi-scale feature fusion proposed in this paper has achieved a greater degree of improvement in positioning accuracy and motion recognition accuracy than the original SSD, which is better than some current the human body moving object detection and recognition algorithm has great advantages.

Highlights

  • Today, surveillance cameras are used in all corners of our lives

  • This article analyzes the multi-scale features extracted from the single shot multi-box detector (SSD) network

  • The sample is divided into a training set and a test set, with a ratio of approximately 5: 1. And according to the data augmentation method of SSD network training data, the training data set is augmented with data

Read more

Summary

INTRODUCTION

Surveillance cameras are used in all corners of our lives. The role of cameras is not just surveillance. Chu et al [13] proposed a single shot multi-box detector (SSD) model This method improves the accuracy of the detection and takes into account the speed of the detection. This article analyzes the multi-scale features extracted from the SSD network It combines the position-sensitive information provided by low-level detailed features with the context information provided by high-level semantic features through feature fusion, effectively improving the target prediction layer in the SSD network positioning accuracy. (1) This paper analyzes the multi-scale features extracted from SSD network, and combines the location sensitive information provided by low-level detailed features with the context information provided by high-level semantic features through feature fusion, effectively improving the positioning accuracy of the target prediction layer in SSD network.

RELATED WORKS
MULTI-SCALE FEATURE FUSION REAL-TIME
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call