Abstract

To achieve efficient inference with a hardware-friendly design, Adder Neural Networks (ANNs) are proposed to replace expensive multiplication operations in Convolutional Neural Networks (CNNs) with cheap additions through utilizing l1 -norm for similarity measurement instead of cosine distance. However, we observe that there exists an increasing gap between CNNs and ANNs with reducing parameters, which cannot be eliminated by existing algorithms. In this paper, we present a simple yet effective Norm-Guided Distillation (NGD) method for l1 -norm ANNs to learn superior performance from l2 -norm ANNs. Although CNNs achieve similar accuracy with l2 -norm ANNs, the clustering performance based on l2 -distance can be easily learned by l1 -norm ANNs compared with cross correlation in CNNs. The features in l2 -norm ANNs are encouraged to achieve intra-class centralization and inter-class decentralization to amplify this advantage. Furthermore, the roughly estimated gradients in vanilla ANNs are modified to a progressive approximation from l2 -norm to l1 -norm so that a more accurate optimization can be achieved. Extensive evaluations on several benchmarks demonstrate the effectiveness of NGD on lightweight networks. For example, our method improves ANN by 10.43% with 0.25× GhostNet on CIFAR-100 and 3.1% with 1.0× GhostNet on ImageNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call