Bayesian asymmetric quantized neural networks

Jen-Tzung Chien,Su-Ting Chang

doi:10.1016/j.patcog.2023.109463

Abstract

This paper develops a robust model compression for neural networks via parameter quantization. Traditionally, quantized neural networks (QNN) were constructed by binary or ternary weights where the weights were deterministic. This paper generalizes QNN in two directions. First, M-ary QNN is developed to adjust the balance between memory storage and model capacity. The representation values and the quantization partitions in M-ary quantization are mutually estimated to enhance the resolution of gradients in neural network training. A flexible quantization with asymmetric partitions is formulated. Second, the variational inference is incorporated to implement the Bayesian asymmetric QNN. The uncertainty of weights is faithfully represented to enhance the robustness of the trained model in presence of heterogeneous data. Importantly, the multiple spike-and-slab prior is proposed to represent the quantization levels in Bayesian asymmetric learning. M-ary quantization is then optimized by maximizing the evidence lower bound of classification network. An adaptive parameter space is built to implement Bayesian quantization and neural representation. The experiments on various image recognition tasks show that M-ary QNN achieves similar performance as the full-precision neural network (FPNN), but the memory cost and the test time are significantly reduced relative to FPNN. The merit of Bayesian M-ary QNN using multiple spike-and-slab prior is investigated.

Full Text