Abstract

Convolutional neural networks have pushed forward image analysis research and computer vision over the last decade, constituting a state-of-the-art approach in object detection today. The design of increasingly deeper and wider architectures has made it possible to achieve unprecedented levels of detection accuracy, albeit at the cost of both a dramatic computational burden and a large memory footprint. In such a context, cloud systems have become a mainstream technological solution due to their tremendous scalability, providing researchers and practitioners with virtually unlimited resources. However, these resources are typically made available as remote services, requiring communication over the network to be accessed, thus compromising the speed of response, availability, and security of the implemented solution. In view of these limitations, the on-device paradigm has emerged as a recent yet widely explored alternative, pursuing more compact and efficient networks to ultimately enable the execution of the derived models directly on resource-constrained client devices. This study provides an up-to-date review of the more relevant scientific research carried out in this vein, circumscribed to the object detection problem. In particular, the paper contributes to the field with a comprehensive architectural overview of both the existing lightweight object detection frameworks targeted to mobile and embedded devices, and the underlying convolutional neural networks that make up their internal structure. More specifically, it addresses the main structural-level strategies used for conceiving the various components of a detection pipeline (i.e., backbone, neck, and head), as well as the most salient techniques proposed for adapting such structures and the resulting architectures to more austere deployment environments. Finally, the study concludes with a discussion of the specific challenges and next steps to be taken to move toward a more convenient accuracy–speed trade-off.

Highlights

  • Despite being widely studied over the last three decades, object detection still represents a highly complex problem and remains an uphill challenge of great interest in research

  • First presented in 1989 by LeCun et al [7], convolutional neural networks (CNNs) have emerged over the last decade as a major driver of progress in image analysis and computer vision, delivering state-of-the-art results in terms of accuracy. Though statistical classifiers, such as support vector machines (SVM) [8], Random Forest [9], Adaboost [10], or traditional neural networks, were considered the standard in computer vision for many years and had a leading role in object detection tasks, and the relatively recent breakthrough of deep learning (DL) techniques represents an unquestionable leap over previous object detection research, enabling the detection of objects in more complex situations and the simplification of the design process of pursued algorithmic solutions

  • Within the group of micro approaches, we find a wide range of options that can be categorized into two distinct subgroups: an initial collection of techniques that focus on convolutional-filterspecific aspects or properties such as the number of filters [107], the size of these in the spatial dimension [49], the number of channels [49, 52, 101, 105, 107, 130], the communication between them [50, 54], or the number of channel groups [101]; and a second subgroup encompassing methods targeting the internal structure of layers or modules such as the exploitation of alternative operations to convolution [47, 48, 50, 52, 54, 105, 107, 128, 130, 131, 133,134,135,136], the replacement [48] or omission [133] of nonlinearity, or the application of an attention mechanism [53, 132, 133]

Read more

Summary

Introduction

Despite being widely studied over the last three decades, object detection still represents a highly complex problem and remains an uphill challenge of great interest in research. First presented in 1989 by LeCun et al [7], convolutional neural networks (CNNs) have emerged over the last decade as a major driver of progress in image analysis and computer vision, delivering state-of-the-art results in terms of accuracy Though statistical classifiers, such as support vector machines (SVM) [8], Random Forest [9], Adaboost [10], or traditional neural networks, were considered the standard in computer vision for many years and had a leading role in object detection tasks, and the relatively recent breakthrough of deep learning (DL) techniques represents an unquestionable leap over previous object detection research, enabling the detection of objects in more complex situations and the simplification of the design process of pursued algorithmic solutions. CNNs represent a comprehensive detection solution that, due to their ability to exploit both spatial and temporal correlation of input data, enables feature representation learning to be carried out directly with no need of domain expertise, an essential requirement to design feature extraction algorithms such as shift invariant feature transform (SIFT) [11], histogram of oriented gradients (HOG) [12], or local binary patterns (LBP) [13], which are omnipresent among the more classical approaches

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.