Deep Learning Architectures for Object Detection and Classification

Bhaumik Vaidya,Chirag Paunwala

doi:10.1007/978-3-030-03131-2_4

Abstract

Object detection and classification have observed large amount of transformation and research after the advances in machine learning algorithms. The advancement in the computing power and data availability is complimenting this transformation in object detection. In recent times, research in the field of object detection is dominated by special type of neural network called Convolutional Neural Network (CNN). The object detection system has to localize objects in an image and accurately classify it. CNN is well suited for this task as it can accurately find features like edges, corners and even more advanced features needed to detect object. This chapter provides detailed overview on how CNN works and how it is useful in object detection and classification task. After that popular deep networks based on CNN like ResNet, VGG16, VGG19, GoogleNet and MobileNet are explained in detail. These networks worked well for object classification task but needed sliding window technique for localizing object in an image. It worked slowly as it needed to process many windows for a single image. This led to more advanced algorithms for object detection based on CNN like Convolutional Neural Network with Region proposals (R-CNN), fast R-CNN, faster R-CNN, Single shot multi-box detector (SSD) and You Only Look Once (YOLO). This chapter provides a detail explanation of how these algorithms work and comparison between them. Most of the deep learning algorithms require large amount of data and dedicated hardware like GPUs to train. To overcome this, the concept of transfer learning is discovered. In that pre-trained models of popular CNN architecture are used to solve new problems. So in the last part of the chapter this concept of transfer learning and when it is useful is explained.

Full Text