Abstract

In recent years, top referred methods on object detection like R-CNN have implemented this task as a combination of proposal region generation and supervised classification on the proposed bounding boxes. Although this pipeline has achieved state-of-the-art results in multiple datasets, it has inherent limitations that make object detection a very complex and inefficient task in computational terms. Instead of considering this standard strategy, in this paper we enhance Detection Transformers (DETR) which tackles object detection as a set-prediction problem directly in an end-to-end fully differentiable pipeline without requiring priors. In particular, we incorporate Feature Pyramids (FP) to the DETR architecture and demonstrate the effectiveness of the resulting DETR-FP approach on improving logo detection results thanks to the improved detection of small logos. So, without requiring any domain specific prior to be fed to the model, DETR-FP obtains competitive results on the OpenLogo and MS-COCO datasets offering a relative improvement of up to 30%, when compared to a Faster R-CNN baseline which strongly depends on hand-designed priors.

Highlights

  • T HE field of object detection made exponential improvements in recent years with the advent of RCNN [1] and its several improvements, which eventually became the standard for object detection in the Machine Learning and Computer Vision communities

  • STATE OF THE ART The work presented in this paper proposes a pure endto-end solution to object detection using transformers [9] expanding on previous work by incorporating a Feature Pyramid network to the Detection Transformers (DETR) architecture and benefiting from bipartite matching losses for set prediction, encoderdecoder architectures based on the transformer, parallel decoding, and other contributions from relevant object detection methods as described

  • This will help us to to get a deeper insight into DETR performance based on the self-attention feature map for the encoder and the decoder around points of interest

Read more

Summary

Introduction

T HE field of object detection made exponential improvements in recent years with the advent of RCNN [1] and its several improvements, which eventually became the standard for object detection in the Machine Learning and Computer Vision communities. Without going into details about the differences in performances in terms of precision and speed of each model, it is safe to say that they all performed really well across different object detection benchmarks at the time of their publication. All of these approaches are constrained to the same limitations that are intrinsic to object detection when implemented as a supervised classification on proposed regions

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.