The evolution of object detection methods

Yibo Sun,Weitong Chen,Zhe Sun

doi:10.1016/j.engappai.2024.108458

Abstract

Object detection is one of the most important domains in computer vision tasks, which is an important branch of artificial intelligence. It aims at finding and locating the accurate position of objects in given pictures or videos. With the development of deep learning techniques, more powerful and robust algorithms have emerged to deal with multi-scale, high-level features to overcome the limitations of traditional pipeline of object detectors. The popularity of transformer framework enables larger capacity datasets by processing self-attention mechanism, and the object detection methods have evolved into a new era. This paper first reviews traditional object detection pipeline and brief history of deep learning, afterwards it focuses on the classification of deep learning-based object detection methods covering Convolution Neural Network based and transformer-based methods. Commonly used datasets and metrics are also covered in the next part. The Convolution Neural Network based methods mainly contain two-stage and one-stage detectors, Convolution Neural Network is the underlying structure of these methods convolutional stages are fundamental parts. Transformer-based models convert traditional object detection issues into end-to-end detection, which is widely used in dealing with images. Finally, the promising future of object detection areas are listed to show guidance on future work.

Full Text