Abstract

Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. OpenVINO is a toolkit based on convolutional neural networks that facilitates fast-track development of computer vision algorithms and deep learning neural networks into vision applications, and enables their easy heterogeneous execution across hardware platforms. A smart queue management can be the key to the success of any sector. In this paper, we focus on edge deployments to make the smart queuing system (SQS) accessible by all also providing ability to run it on cheap devices. This gives it the ability to run the queuing system deep learning algorithms on pre-existing computers which a retail store, public transportation facility or a factory may already possess, thus considerably reducing the cost of deployment of such a system. SQS demonstrates how to create a video AI solution on the edge. We validate our results by testing it on multiple edge devices, namely CPU, integrated edge graphic processing unit (iGPU), vision processing unit (VPU) and field-programmable gate arrays (FPGAs). Experimental results show that deploying a SQS on edge is very promising.

Highlights

  • We can define a queue as an arrangement of people or vehicles waiting in line for their turn to get a service or move forward in an activity, while queuing is the act of taking place in such an arrangement (Lee 2019)

  • The optimization takes into account more powerful and specialized hardware with accelerators

  • Our experiments show that per-channel quantization is needed to compensate for the accuracy drop resulting from quantization, asymmetric per-layer quantization seems to work best for this, while at the same time, it works as a good baseline for post-training quantization of weights and activations

Read more

Summary

Introduction

We can define a queue as an arrangement of people or vehicles waiting in line for their turn to get a service or move forward in an activity, while queuing is the act of taking place in such an arrangement (Lee 2019). The proposed system performing the computation on the device itself provides a lower latency as the algorithms are being run on the device itself (Zhang 2017) This is helpful on a retail or transportation scenario where low latency in the application would be required. We propose this system which is accessible and implementable, requires very low deployment costs, consumes lower power, does not have a huge impact of network and gives a lower latency. We do so to ensure privacy by not sending the video stream itself but by sending a transformed version of it, i.e., a small sized locally trained model This is sent to a managed server which uses this to improve on or update the model making it better at performing inferences. The last section concludes the article and gives future works

Related works
Proposed architecture
Experimental setup
Freezing models
Optimizing models
Converting models to an intermediate representation
Post training optimization
Comparing models with deep learning workbench
Identifying hotspots in the application
Testing the application for production
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call