Feature Engineering Versus Deep Learning for Scene Recognition: A Brief Survey

Seba Susan,Maduri Tuteja

doi:10.1142/s0219467825500548

Abstract

Scene recognition is an important computer vision task that has evolved from the study of the biological visual system. Its applications range from video surveillance, autopilot systems, to robotics. The early works were based on feature engineering that involved the computation and aggregation of global and local image descriptors. Several popular image features such as SIFT, SURF, HOG, ORB, LBP, KAZE, etc. have been proposed and applied to the task with successful results. Features can be either computed from the entire image on a global scale, or extracted from local sub-regions and aggregated across the image. Suitable classifier models are deployed that learn to classify these features. This review paper analyzes several of these handcrafted features that have been applied to the scene recognition task over the past decades, and tracks the transition from the traditional feature engineering to deep learning which forms the current state of the art in computer vision. Deep learning is now deemed to have overtaken feature engineering in several computer vision applications. Deep convolutional neural networks and vision transformers are the current state of the art for object recognition. However, scenes from urban landscapes are bound to contain similar objects posing a challenge to deep learning solutions for scene recognition. In our study, a critical analysis of feature engineering and deep learning methodologies for scene recognition is provided, and results on benchmark scene datasets are presented, concluding with a discussion on challenges and possible solutions that may facilitate more accurate scene recognition.

Full Text