Image classification using DETR based object-level feature

Chung-Gi Ban,Dayoung Park,Youngbae Hwang

doi:10.23919/iccas55662.2022.10003912

Abstract

The object in an image is the main information of image representation for image classification. In case that the background in the image is complex or an object size is small, the existing invariant feature, such as Scale Invariant Feature Transform (SIFT) or Speeded Up Robust Features (SURF) is not easy to use for object-level representation. Because SIFT can not distinguish whether the feature includes relevant object information, it may consist of background or less informative features. We use Detection Transformer (DETR), the state of the art object detector to represent the object-level information. By visualizing the attention map of Transformer decoder, we find that each output vector indicates the region of objects effectively. Bag of visual words (BoVW) is applied to represent N output vectors of DETR as the feature of a query image. Based on scene-level and object-level datasets, we compare our method with SIFT based BoVW as an image classification task. We show that the proposed method perform better for object-level dataset than BoVW of SIFT.

Full Text