Adaptive Deep Learning for Soft Real-Time Image Classification

Fangming Chai,Kyoung-Don Kang

doi:10.3390/technologies9010020

Abstract

CNNs (Convolutional Neural Networks) are becoming increasingly important for real-time applications, such as image classification in traffic control, visual surveillance, and smart manufacturing. It is challenging, however, to meet timing constraints of image processing tasks using CNNs due to their complexity. Performing dynamic trade-offs between the inference accuracy and time for image data analysis in CNNs is challenging too, since we observe that more complex CNNs that take longer to run even lead to lower accuracy in many cases by evaluating hundreds of CNN models in terms of time and accuracy using two popular data sets, MNIST and CIFAR-10. To address these challenges, we propose a new approach that (1) generates CNN models and analyzes their average inference time and accuracy for image classification, (2) stores a small subset of the CNNs with monotonic time and accuracy relationships offline, and (3) efficiently selects an effective CNN expected to support the highest possible accuracy among the stored CNNs subject to the remaining time to the deadline at run time. In our extensive evaluation, we verify that the CNNs derived by our approach are more flexible and cost-efficient than two baseline approaches. We verify that our approach can effectively build a compact set of CNNs and efficiently support systematic time vs. accuracy trade-offs, if necessary, to meet the user-specified timing and accuracy requirements. Moreover, the overhead of our approach is little/acceptable in terms of latency and memory consumption.

Highlights

Machine learning [1] has numerous applications, including image processing [2], natural language processing [3], and recommendation systems [4]
To find Pareto optimal Convolutional Neural Networks (CNNs) cost-efficiently, we take the following steps illustrated in Figure 6: 1
If the current set of Pareto efficient CNN models is HP = { H1, . . . , Hi } where they are sorted in ascending order of the accuracy and inference time, we search for Hi+1 whose accuracy, α( Hi+1 ), is higher than the accuracy of Hi, α( Hi ), and its inference time, exec_time( Hi+1 ), is not longer than D by incrementally modifying the hyperparameters of Hi in the neighborhood of the search space to efficiently find Hi+1

Summary

Introduction

Machine learning [1] has numerous applications, including image processing [2], natural language processing [3], and recommendation systems [4]. When an input image is provided, a CNN extracts features from the image using multiple pairs of convolutional and pooling layers and classify the image into a class using fully connected layers. A CNN model consists of an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer. The number of convolutional and pooling layers vary depending on applications and accuracy requirements. (but not necessarily), the deeper the higher is the accuracy with potentially diminishing returns

Results

Discussion

Conclusion