Abstract

Mixed-precision quantization, where more sensitive layers are kept at higher precision, can achieve the trade-off between accuracy and complexity of neural networks. However, the search space for mixed-precision grows exponentially with the number of layers, making the brute force approach infeasible on deep networks. To reduce this exponential search space, recent efforts use Pareto frontier or integer linear programming to select the bit-precision of each layer. Unfortunately, we find that these prior works rely on a single constraint. In practice, model complexity includes space complexity and time complexity, and the two are weakly correlated, thus using simply one as a constraint leads to sub-optimal results. Besides this, they require manually set constraints, making them only pseudo-automatic. To address the above issues, we propose High-dimensional Trade-off Quantization (HTQ), which automatically determines the bit-precision in the high-dimensional space of model accuracy, space complexity, and time complexity without any manual intervention. Specifically, we use the saliency criterion based on connection sensitivity to indicate the accuracy perturbation after quantization, which performs similarly to Hessian information but can be calculated quickly (more than 1000× speedup). The bit-precision is then automatically selected according to the three-dimensional (3D) Pareto frontier of the total perturbation, model size, and bit operations (BOPs) without manual constraints. Moreover, HTQ allows for the joint optimization of weights and activations, and thus the bit-precisions of both can be computed concurrently. Compared to state-of-the-art methods, HTQ achieves higher accuracy and lower space/time complexity on various model architectures for image classification and object detection tasks. Code is available at: https://github.com/zkkli/HTQ.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.