Towards interpreting deep neural networks via layer behavior understanding

Jiezhang Cao,Xiangmiao Wu,Mingkui Tan,Xiping Hu,Jincheng Li

doi:10.1007/s10994-021-06074-8

Abstract

Deep neural networks (DNNs) have achieved success in many machine learning tasks. However, how to interpret DNNs is still an open problem. In particular, how do hidden layers behave is not clearly understood. In this paper, relying on a teacher-student paradigm, we seek to understand the layer behaviors of DNNs by “monitoring” the distribution evolution for both across-layer and single-layer along the depth and training epochs, respectively. Relying on the optimal transport theory, we employ the Wasserstein distance (W-distance) to measure the divergence between the layer distribution and the target distribution. Theoretically, we prove that (i) the W-distance between the distribution of any layer and the target distribution tends to decrease along the depth; (ii) for a specific layer, the W-distance between the distribution in an iteration and the target distribution tends to decrease along training epochs; (iii) a deeper layer, however, is not always better than a shallower layer. Relying on these properties, we are able to propose an early-exit inference method to improve the performance of the multi-label classification. Moreover, our results help to analyze the stability of layer distributions and explain why auxiliary losses are helpful in training DNNs. Extensive experiments justify our theoretical findings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards interpreting deep neural networks via layer behavior understanding

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Journal: Machine Learning	Publication Date: Jan 28, 2022
Citations: 4

Similar Papers

Neuroevolution in Deep Neural Networks: Current Trends and Future Challenges
Edgar Galvan ... Peter Mooney
IEEE Transactions on Artificial Intelligence | VOL. 2
Edgar Galvan, et. al.Edgar Galvan ... Peter Mooney
04 May 2021
IEEE Transactions on Artificial Intelligence | VOL. 2

A convergence analysis of Nesterov’s accelerated gradient method in training deep linear neural networks
Xin Liu ... Zhisong Pan
Information Sciences | VOL. 612
Xin Liu, et. al.Xin Liu ... Zhisong Pan
05 Sep 2022
Information Sciences | VOL. 612

A Framework for Distributed Deep Neural Network Training with Heterogeneous Computing Platforms
Bontak Gu ... Young Geun Kim
-
Bontak Gu, et. al.Bontak Gu ... Young Geun Kim
01 Dec 2019
01 Dec 2019

TxSim: Modeling Training of Deep Neural Networks on Resistive Crossbar Systems
Sourjya Roy ... Shrihari Sridharan
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 29
Sourjya Roy, et. al.Sourjya Roy ... Shrihari Sridharan
31 Mar 2021
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards interpreting deep neural networks via layer behavior understanding

Abstract

Talk to us

Similar Papers

More From: Machine Learning