Abstract

Most contemporary object detection approaches assume each object instance in the training data to be uniquely represented by a single bounding box. In this paper, we go beyond this conventional view by allowing an object instance to be described by multiple bounding boxes. The new bounding box annotations are determined based on the alignment of an object instance with the other training instances in the dataset. Our proposal enables the training data to be reused multiple times for training richer multi-component category models. We operationalize this idea by two complementary operations: bounding box shrinking, which finds subregions of an object instance that could be shared; and bounding box enlarging, which enlarges object instances to include local contextual cues. We empirically validate our approach on the PASCAL VOC detection dataset.

Highlights

  • Consider the task of building a sliding-window object detector

  • The human-labeled “bicycle” bounding box is indicated by the solid green box. Given this ground-truth framing for the object instance, it is most similar to instances in the “45◦-view bicycle” subcategory, so, in a standard mixturemodel detector, it would be assigned to subcategory1

  • Most sliding-window detection approaches continue to use features computed only within the object bounding box to train the classifier. This is because the local context around the bounding box is highly multimodal for the harder PASCAL or MIT-SUN09 datasets e.g., a horse jumping over a fence appears in a different context compared to a close-up horse shot

Read more

Summary

Introduction

Consider the task of building a sliding-window object detector. The standard learningbased approach is to first turn each human-labeled bounding box into a feature vector using some feature descriptor, e.g. HOG, and train a classifier, e.g. SVM, on a stack of these c 2011. The recent success of the discriminatively-trained mixture model framework of Felzenszwalb et al, [8] has led to the wide popularity of such models for object detection [14, 17, 18, 20, 23] We operationalize this by two complementary operations: bounding box shrinking, which aims to find subregions of an instance that could be shared; and bounding box enlarging, which aims to create new subcategories by enlarging instances to include their local context. We show that these operations create more training data for each subcategory, and improve object detection performance, especially for occluded/truncated instances

Overview
Related Work
Approach
Shrinking Ground-truth Boxes
Enlarging Ground-truth Boxes
Initialization
Experimental Analysis
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call