Abstract

Artificial intelligence at the edge can help solve complex tasks faced by various sectors such as automotive, healthcare and surveillance. However, challenged by the lack of computational power from the edge devices, artificial intelligence models are forced to adapt. Many have developed and quantified model compres-sion approaches over the years to tackle this problem. However, not many have considered the overhead of on-device model compression, even though model compression can take a considerable amount of time. With the added metric, we provide a more complete view on the efficiency of model compression on the edge. The objective of this research is identifying the benefit of compression methods and it’s tradeoff between size and latency reduction versus the accuracy loss as well as compression time in edge devices. In this work, quantitative method is used to analyze and rank three common ways of model compression: post-training quantization, unstructured pruning and knowledge distillation on the basis of accuracy, latency, model size and time to compress overhead. We concluded that knowledge distillation is the best, with potential of up to 11.4x model size reduction, and 78.67% latency speed up, with moderate loss of accura-cy and compression time.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.