Object Detection and Localization in Natural Scenes Through Single-Step and Two-Step Models

Aneela Aslam,Nudrat Nida,Aun Irtaza

doi:10.1109/icetst49965.2020.9080728

Abstract

Object detectors are of two types in state-of-the-art (SOTA) approaches, i.e. the two-stage detectors (Mask-RCNN, Faster-RCNN, Fast-RCNN) and one stage detector (SSD and YOLO). In two-stage detectors, first, generate region proposals and extract deep features for bounding box regression and classification for object detection. These two-stage models achieve a higher accuracy rate but seem slow in performance. Hence, in one-stage detector takes the image as input with region proposal generations and object detection is performed through regression and classification only. Hence these methods show a lower accuracy, however, they are more robust than two-stage detectors. In our research, we examine both types of detectors including Mask RCNN, SSD, and Retina Net and compare them by varying back-bone CNN network architectures i.e. (Inception V2, ResNet 50). These methods are evaluated on a subset of challenging datasets PASCAL-VOC 2012 and MS-COCO.

Full Text