How a student becomes a teacher: learning and forgetting through spectral methods

Lorenzo Giambagli,Lorenzo Buffoni,Lorenzo Chicchi,Duccio Fanelli

doi:10.1088/1742-5468/ad1bea

Lorenzo Giambagli, Lorenzo Buffoni + Show 2 more

Open Access

https://doi.org/10.1088/1742-5468/ad1bea

Copy DOI

Abstract

In theoretical machine learning, the teacher–student paradigm is often employed as an effective metaphor for real-life tuition. A student network is trained on data generated by a fixed teacher network until it matches the instructor’s ability to cope with the assigned task. The above scheme proves particularly relevant when the student network is overparameterized (namely, when larger layer sizes are employed) as compared to the underlying teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a decisive leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits. Code is available at: https://github.com/Jamba15/Spectral-regularization-teacher-student/tree/master.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Statistical Mechanics: Theory and Experiment	Publication Date: Mar 22, 2024
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

How a student becomes a teacher: learning and forgetting through spectral methods

Abstract

Talk to us

Similar Papers

More From: Journal of Statistical Mechanics: Theory and Experiment

Lead the way for us

Similar Papers

When Pansharpening Meets Graph Convolution Network and Knowledge Distillation
Keyu Yan ... Danfeng Hong
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Keyu Yan, et. al.Keyu Yan ... Danfeng Hong
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

Learning Student Networks in the Wild
Hanting Chen ... Yunhe Wang
-
Hanting Chen, et. al.Hanting Chen ... Yunhe Wang
01 Jun 2021
01 Jun 2021

Learning Student Networks via Feature Embedding.
Hanting Chen ... Chao Xu
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32
Hanting Chen, et. al.Hanting Chen ... Chao Xu
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 32

Self-Adaptive Teacher-Student framework for colon polyp segmentation from unannotated private data with public annotated datasets.
Yiwen Jia ... Fu Dai
PloS one | VOL. 19
Yiwen Jia, et. al.Yiwen Jia ... Fu Dai
28 Aug 2024
PloS one | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

How a student becomes a teacher: learning and forgetting through spectral methods

Abstract

Talk to us

Similar Papers

More From: Journal of Statistical Mechanics: Theory and Experiment