Abstract

In this work, we propose a mechanism for knowledge transfer between Convolutional Neural Networks via the geometric regularization of local features produced by the activations of convolutional layers. We formulate appropriate loss functions, driving a “student” model to adapt such that its local features exhibit similar geometrical characteristics to those of an “instructor” model, at corresponding layers. The investigated functions, inspired by manifold-to-manifold distance measures, are designed to compare the neighboring information inside the feature space of the involved activations without any restrictions in the features’ dimensionality, thus enabling knowledge transfer between different architectures. Experimental evidence demonstrates that the proposed technique is effective in different settings, including knowledge-transfer to smaller models, transfer between different deep architectures and harnessing knowledge from external data, producing models with increased accuracy compared to a typical training. Furthermore, results indicate that the presented method can work synergistically with methods such as knowledge distillation, further increasing the accuracy of the trained models. Finally, experiments on training with limited data show that a combined regularization scheme can achieve the same generalization as a non-regularized training with 50% of the data in the CIFAR-10 classification task.

Highlights

  • Recent advancements in Convolutional Neural Networks (CNNs) have enabled a revolutionary growth in several fields of machine vision and artificial intelligence [1]

  • Partly motivated by recent works demonstrating that local descriptors can be used to construct effective regularization functions that manipulate style [13] and texture [14] of images in generative tasks, we investigate ways to utilize geometric regularization of local features from intermediate layers of the trained CNN, as a mechanism for knowledge transfer between the models

  • Since the objective of this work is to propose a mechanism for knowledge transfer between models of different architectures, we utilized a collection of CNN models with different topologies

Read more

Summary

Introduction

Recent advancements in Convolutional Neural Networks (CNNs) have enabled a revolutionary growth in several fields of machine vision and artificial intelligence [1]. Their characteristic ability to generalize well in difficult visual tasks has been a key component in the diffusion of this technology into numerous applications that are based on the analysis of 2/3D data. By enabling a neural model to harness the information stored into another trained network, the latter effectively acts as an extra source of information [4] This could facilitate training with less data, improve the accuracy of the trained models, train smaller and more efficient models friendlier to the limitations of edge computing, etc

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call