Abstract

Object detection methods aim to identify all target objects in the target image and determine the categories and position information in order to achieve machine vision understanding. Numerous approaches have been proposed to solve this problem, mainly inspired by methods of computer vision and deep learning. However, existing approaches always perform poorly for the detection of small, dense objects, and even fail to detect objects with random geometric transformations. In this study, we compare and analyse mainstream object detection algorithms and propose a multi-scaled deformable convolutional object detection network to deal with the challenges faced by current methods. Our analysis demonstrates a strong performance on par, or even better, than state of the art methods. We use deep convolutional networks to obtain multi-scaled features, and add deformable convolutional structures to overcome geometric transformations. We then fuse the multi-scaled features by up sampling, in order to implement the final object recognition and region regress. Experiments prove that our suggested framework improves the accuracy of detecting small target objects with geometric deformation, showing significant improvements in the trade-off between accuracy and speed.

Highlights

  • The main purpose of object detection is to identify and locate one or more effective targets from still image or video data

  • Compared with other object detection algorithms, our Frames per second (FPS) increases approximately 3 times compared to the regions with convolutional neural networks (R-Convolutional neural network (CNN)) series, with the Mean average precision (MAP) approximately 7% higher compared to the Single shot multiBox detection (SSD) and You only look once (YOLO) series

  • Italic values refer to respectively the highest MAP value and FPS value We control the sizes of the input images under different models to design multiple sets of comparison experiments, and compare the object detection accuracy and speed low-level features by up-sampling to extract target object position information

Read more

Summary

Introduction

The main purpose of object detection is to identify and locate one or more effective targets from still image or video data It comprehensively includes a variety of important techniques, such as image processing, pattern recognition, artificial intelligence and machine learning. The SVM (support vector machine) or Adaboost algorithms are used for classification in order to obtain target information. These traditional extracting feature models are only able to determine low-level feature information, such as contour information and texture information, and have limitations in detecting multiple targets under complex scenes due to their poor generalization performance. The R-CNN model has two operation stages (candidate region proposal and further detection) that allow for higher detection accuracy, while SSD and YOLO are able to directly detect the classification and position information, improving the detection speed

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.