Abstract

Aiming at the problem of low pedestrian target detection accuracy, we propose a detection algorithm based on optimized Mask R-CNN which uses the latest research results of deep learning to improve the accuracy and speed of detection results. Due to the influence of illumination, posture, background, and other factors on the human target in the natural scene image, the complexity of target information is high. SKNet is used to replace the part of the convolution module in the depth residual network model in order to extract features better so that the model can adaptively select the best convolution kernel during training. In addition, according to the statistical law, the length-width ratio of the anchor box is modified to make it more accord with the natural characteristics of the pedestrian target. Finally, a pedestrian target dataset is established by selecting suitable pedestrian images in the COCO dataset and expanded by adding noise and median filtering. The optimized algorithm is compared with the original algorithm and several other mainstream target detection algorithms on the dataset; the experimental results show that the detection accuracy and detection speed of the optimized algorithm are improved, and its detection accuracy is better than other mainstream target detection algorithms.

Highlights

  • V Select network structure and stronger model migration ability. e anchor-free method is based on the complete feature gold tower, which has a huge amount of calculation, while the anchor-based method reduces the number of layers of the pyramid, which greatly reduces the amount of calculation, the detection speed is faster, and the detection accuracy is higher

  • Based on the Mask R-CNN target detection algorithm, we have made some optimizations to improve the accuracy of pedestrian target detection. e main work of this article consists of the following three parts: 3. Proposed Network Framework

  • We selected 1,000 pedestrian images from the “person” category, in which scenes are under different angles, lighting, and pedestrian density as much as possible to increase the complexity of the data. is dataset is composed of 1000 pedestrian images, of which 900 are used as the training set and 100 are used as the test set. ere are 892 positive sample images in the training set and 3262 pedestrian targets and 99 positive sample images in the test set and 478 pedestrian targets

Read more

Summary

Fuse a So max b

V Select network structure and stronger model migration ability. e anchor-free method is based on the complete feature gold tower, which has a huge amount of calculation, while the anchor-based method reduces the number of layers of the pyramid, which greatly reduces the amount of calculation, the detection speed is faster, and the detection accuracy is higher. Mask R-CNN [6] is a further extension of this series of deep learning target detection algorithms It adds a segmentation task branch based on the Faster R-CNN detection branch, and the segmentation task is performed simultaneously with the classification and regression tasks. Based on the Mask R-CNN target detection algorithm, we have made some optimizations to improve the accuracy of pedestrian target detection. (3) In the ResNet, the SKNet lightweight network module is used to replace the part of the convolution module so that the model can adaptively select the best convolution kernel during the training process, increase the quality of feature representation, and improve detection accuracy. E image is first inputted into the backbone network composed of the ResNet and the FPN. e backbone network extracts some shared feature maps that combine the coordinate information of the detected target position and the appearance texture information. en, the RPN area

Median filter
Network model
Classification loss and regression loss are defined as
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.