Abstract

Model quantization can reduce the model size and computational latency, it has been successfully applied for many applications of mobile phones, embedded devices, and smart chips. Mixed-precision quantization models can match different bit precision according to the sensitivity of different layers to achieve great performance. However, it is difficult to quickly determine the quantization bit precision of each layer in deep neural networks under some constraints (for example, hardware resources, energy consumption, model size, and computational latency). In this article, a novel sequential single-path search (SSPS) method for mixed-precision model quantization is proposed, in which some given constraints are introduced to guide the searching process. A single-path search cell is proposed to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of the searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (for example, ResNet-20, 18, 34, 50, and MobileNet-V2) and datasets (for example, CIFAR-10, ImageNet, and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform-precision counterparts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call