Abstract
This paper develops a robust model compression for neural networks via parameter quantization. Traditionally, quantized neural networks (QNN) were constructed by binary or ternary weights where the weights were deterministic. This paper generalizes QNN in two directions. First, M-ary QNN is developed to adjust the balance between memory storage and model capacity. The representation values and the quantization partitions in M-ary quantization are mutually estimated to enhance the resolution of gradients in neural network training. A flexible quantization with asymmetric partitions is formulated. Second, the variational inference is incorporated to implement the Bayesian asymmetric QNN. The uncertainty of weights is faithfully represented to enhance the robustness of the trained model in presence of heterogeneous data. Importantly, the multiple spike-and-slab prior is proposed to represent the quantization levels in Bayesian asymmetric learning. M-ary quantization is then optimized by maximizing the evidence lower bound of classification network. An adaptive parameter space is built to implement Bayesian quantization and neural representation. The experiments on various image recognition tasks show that M-ary QNN achieves similar performance as the full-precision neural network (FPNN), but the memory cost and the test time are significantly reduced relative to FPNN. The merit of Bayesian M-ary QNN using multiple spike-and-slab prior is investigated.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.