Classification of Histopathologic Images of Breast Cancer by Multi-teacher Small-sample Knowledge Distillation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Model fusion can effectively improve the effect of model prediction, but it will bring about an increase in time. In this paper, the dual-stage progressive knowledge distillation is improved in combination with multi-teacher knowledge distillation technology. A simple and effective multi-teacher's Softtarget integration method is proposed in multi-teacher network knowledge distillation. Improve the guiding role of excellent models in knowledge distillation. Dual-stage progressive knowledge distillation is a method for small sample knowledge distillation. A progressive network grafting method is used to realize knowledge distillation in a small sample environment. In the first step, the student blocks are grafted one by one onto the teacher network and intertwined with other teacher blocks for training, and the training process only updates the parameters of the grafted blocks. In the second step, the trained student blocks are grafted onto the teacher network in turn, so that the learned student blocks adapt to each other and finally replace the teacher network to obtain a lighter network structure. Using Softtarget acquired by this method in Dual-stage progressive knowledge distillation instead of Hardtarget training, excellent results were obtained on BreakHis data sets.

Similar Papers
  • Research Article
  • Cite Count Icon 57
  • 10.1109/tip.2022.3212905
A General Dynamic Knowledge Distillation Method for Visual Analytics.
  • Jan 1, 2022
  • IEEE Transactions on Image Processing
  • Zhigang Tu + 2 more

Existing knowledge distillation (KD) method normally fixes the weight of the teacher network, and uses the knowledge from the teacher network to guide the training of the student network no-ninteractively, thus it is called static knowledge distillation (SKD). SKD is widely used in model compression on the homologous data and knowledge transfer on the heterogeneous data. However, the teacher network that with fixed-weight constrains the student network to learn knowledge from it. It is worth expecting that the teacher network itself can be continuously optimized to promote the learning ability of the student network dynamically. To overcome this limitation, we propose a novel dynamic knowledge distillation (DKD) method, in which the teacher network and the student network can learn from each other interactively. Importantly, we analyzed the effectiveness of DKD mathematically (see Eq. 4), and addressed one crucial issue caused by the continuous change of the teacher network in the dynamic distillation process via designing a valid loss function. We verified the practicality of our DKD by extensive experiments on various visual tasks, e.g. for model compression, we conducted experiments on image classification and object detection. For knowledge transfer, video-based human action recognition is chosen for analysis. The experimental results on benchmark datasets (i.e. ILSVRC2012, COCO2017, HMDB51, UCF101) demonstrated that the proposed DKD is valid to improve the performance of these visual tasks for a large margin. The source code is publicly available online at1.

  • Conference Article
  • Cite Count Icon 3
  • 10.1145/3394171.3413985
Learning From Music to Visual Storytelling of Shots: A Deep Interactive Learning Mechanism
  • Oct 12, 2020
  • Jen-Chun Lin + 4 more

Learning from music to visual storytelling of shots is an interesting and emerging task. It produces a coherent visual story in the form of a shot type sequence, which not only expands the storytelling potential for a song but also facilitates automatic concert video mashup process and storyboard generation. In this study, we present a deep interactive learning (DIL) mechanism for building a compact yet accurate sequence-to-sequence model to accomplish the task. Different from the one-way transfer between a pre-trained teacher network (or ensemble network) and a student network in knowledge distillation (KD), the proposed method enables collaborative learning between an ensemble teacher network and a student network. Namely, the student network also teaches. Specifically, our method first learns a teacher network that is composed of several assistant networks to generate a shot type sequence and produce the soft target (shot types) distribution accordingly through KD. It then constructs the student network that learns from both the ground truth label (hard target) and the soft target distribution to alleviate the difficulty of optimization and improve generalization capability. As the student network gradually advances, it turns to feed back knowledge to the assistant networks, thereby improving the teacher network in each iteration. Owing to such interactive designs, the DIL mechanism bridges the gap between the teacher and student networks and produces more superior capability for both networks. Objective and subjective experimental results demonstrate that both the teacher and student networks can generate more attractive shot sequences from music, thereby enhancing the viewing and listening experience.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icce-asia57006.2022.9954844
Plant Leaf Segmentation Using Knowledge Distillation
  • Oct 26, 2022
  • Joo-Yeon Jung + 2 more

This paper proposes a method to segment plant leaves using knowledge distillation. Unlike the existing knowledge distillation method aimed at lightening the model, the architectures of the teacher and student networks are kept identical. Plants have many leaves, and each leaf is very small. To segment each plant leaf well, clustering is used through spatial embedding. The teacher and student networks perform segmentation based on spatial embedding. The teacher network is trained with a large dataset and then distills its segmentation knowledge into the student network. Two types of knowledge are distilled from the teacher network: feature distillation and attention distillation. The results of the experiment demonstrate that better instance segmentation can be achieved when using knowledge distillation.

  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.eswa.2021.114813
Data-free knowledge distillation in neural networks for regression
  • Mar 6, 2021
  • Expert Systems with Applications
  • Myeonginn Kang + 1 more

Data-free knowledge distillation in neural networks for regression

  • Research Article
  • 10.1088/1742-6596/2759/1/012010
Distilling Structural Knowledge for Platform-Aware Semantic Segmentation
  • May 1, 2024
  • Journal of Physics: Conference Series
  • Guilin Li + 2 more

Knowledge Distillation (KD) aims to distill the dark knowledge of a high-powered teacher network into a student network, which can improve the capacity of student network and has been successfully applied to semantic segmentation. However, the standard knowledge distillation approaches merely represent the supervisory signal of teacher network as the dark knowledge, while ignoring the impact of network architecture during distillation. In this paper, we found that the student network with a more similar architecture against the teacher network obtains more performance gain from distillation. Therefore, a more generalized paradigm for knowledge distillation is to distill both the soft-label and the structure of the teacher network. We propose a novel Structural Distillation (SD) method which introduces the structural similarity constraints into vanilla knowledge distillation. We leverage Neural Architecture Search technique to search optimal student structure for semantic segmentation from a well-designed search space, which mimics the given teacher both in terms of soft-label and network structure. Experiment results make clear that our proposed method outperforms both the NAS with conventional Knowledge Distillation and human-designed methods, and achieves sota performance on the Cityscapes dataset under various platform-aware latency constraints. Furthermore, the best architecture discovered on Cityscapes also transfers well to the PASCAL VOC2012 dataset.

  • Research Article
  • Cite Count Icon 31
  • 10.1109/tcsvt.2023.3271124
Skill-Transferring Knowledge Distillation Method
  • Nov 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shunzhi Yang + 5 more

Knowledge distillation is a deep learning method that mimics the way that humans teach, i.e., a teacher network is used to guide the training of a student one. Knowledge distillation can generate an efficient student network to facilitate deployment in resource-constrained edge computing devices. Existing studies have typically mined knowledge from a teacher network and transferred it to a student one. The latter can only passively receive knowledge but cannot understand how the former acquires the knowledge, thus limiting the latter’s performance improvement. Inspired by the old Chinese saying “Give a man a fish and you feed him for a day; teach a man how to fish and you feed him for a lifetime,” this work proposes a Skill-transferring Knowledge Distillation (SKD) method to boost a student network’s ability to create new valuable knowledge. SKD consists of two main meta-learning networks: Teacher Behavior Teaching and Teacher Experience Teaching. The former captures the process of a teacher network’s learning behavior in the hidden layers and can predict the teacher network’s subsequent behavior based on previous ones. The latter models the optimal empirical knowledge of a teacher network’s output layer at each learning stage. With their help, a teacher network can provide its actions to a student one in the subsequent behavior and its optimal empirical knowledge in the current stage. The overall performance of SKD is verified through its application to multiple object recognition tasks and comparison with the state of the art.

  • Research Article
  • Cite Count Icon 60
  • 10.1016/j.neucom.2022.10.083
Low-light image enhancement with knowledge distillation
  • Nov 11, 2022
  • Neurocomputing
  • Ziwen Li + 2 more

Low-light image enhancement with knowledge distillation

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.asoc.2024.112237
Efficient face anti-spoofing via head-aware transformer based knowledge distillation with 5 MB model parameters
  • Sep 10, 2024
  • Applied Soft Computing
  • Jun Zhang + 6 more

Efficient face anti-spoofing via head-aware transformer based knowledge distillation with 5 MB model parameters

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.media.2022.102725
Adversarial learning-based multi-level dense-transmission knowledge distillation for AP-ROP detection.
  • Feb 1, 2023
  • Medical Image Analysis
  • Hai Xie + 8 more

Adversarial learning-based multi-level dense-transmission knowledge distillation for AP-ROP detection.

  • Research Article
  • Cite Count Icon 41
  • 10.1007/s11263-019-01286-x
Learning an Evolutionary Embedding via Massive Knowledge Distillation
  • Jan 21, 2020
  • International Journal of Computer Vision
  • Xiang Wu + 3 more

Knowledge distillation methods aim at transferring knowledge from a large powerful teacher network to a small compact student one. These methods often focus on close-set classification problems and matching features between teacher and student networks from a single sample. However, many real-world classification problems are open-set. This paper proposes an Evolutionary Embedding Learning (EEL) framework to learn a fast and accurate student network for open-set problems via massive knowledge distillation. First, we revisit the formulation of canonical knowledge distillation and make it suitable for the open-set problems with massive classes. Second, by introducing an angular constraint, a novel correlated embedding loss (CEL) is proposed to match embedding spaces between the teacher and student network from a global perspective. Lastly, we propose a simple yet effective paradigm towards a fast and accurate student network development for knowledge distillation. We show the possibility to implement an accelerated student network without sacrificing accuracy, compared with its teacher network. The experimental results are quite encouraging. EEL achieves better performance with other state-of-the-art methods for various large-scale open-set problems, including face recognition, vehicle re-identification and person re-identification.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 65
  • 10.1007/s10489-022-03486-4
Teacher-student collaborative knowledge distillation for image classification
  • May 4, 2022
  • Applied Intelligence
  • Chuanyun Xu + 5 more

A single model usually cannot learn all the appropriate features with limited data, thus leading to poor performance when test data are used. To improve model performance, we propose a teacher-student collaborative knowledge distillation (TSKD) method based on knowledge distillation and self-distillation. The method consists of two parts: learning in the teacher network and self-teaching in the student network. Learning in the teacher network allows the student network to use knowledge from the teacher network. Self-teaching in the student network is to build a multi-exit network based on self-distillation and provide deep features as supervised information for training. In the inference stage, we use ensembles to vote on the classification results of multiple sub-models in the student network. The experimental results demonstrate the superior performance of our method compared with a traditional knowledge distillation method and a self-distillation-based multi-exit network.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.asoc.2020.107051
A tucker decomposition based knowledge distillation for intelligent edge applications
  • Dec 29, 2020
  • Applied Soft Computing
  • Cheng Dai + 3 more

A tucker decomposition based knowledge distillation for intelligent edge applications

  • Research Article
  • Cite Count Icon 53
  • 10.1016/j.knosys.2022.108136
Multi-level knowledge distillation for low-resolution object detection and facial expression recognition
  • Jan 10, 2022
  • Knowledge-Based Systems
  • Tingsong Ma + 2 more

Multi-level knowledge distillation for low-resolution object detection and facial expression recognition

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s11548-025-03346-9
Feature distance-weighted adaptive decoupled knowledge distillation for medical image segmentation.
  • Apr 22, 2025
  • International journal of computer assisted radiology and surgery
  • Xiangchun Yu + 4 more

This paper aims to apply decoupled knowledge distillation (DKD) to medical image segmentation, focusing on transferring knowledge from a high-performance teacher network to a lightweight student network, thereby facilitating model deployment on embedded devices. We initially decouple the distillation loss into pixel-wise target class knowledge distillation (PTCKD) and pixel-wise non-target class knowledge distillation (PNCKD). Subsequently, to address the limitations of the fixed weight paradigm in PTCKD, we propose a novel feature distance-weighted adaptive decoupled knowledge distillation (FDWA-DKD) method. FDWA-DKD quantifies the feature disparity between student and teacher, generating instance-level adaptive weights for PTCKD. We design a feature distance weighting (FDW) module that dynamically calculates feature distance to obtain adaptive weights, integrating feature space distance information into logit distillation. Lastly, we introduce a class-wise feature probability distribution loss to encourage the student to mimic the teacher's spatial distribution. Extensive experiments conducted on the Synapse and FLARE22 datasets demonstrate that our proposed FDWA-DKD achieves satisfactory performance, yielding optimal Dice scores and, in some instances, surpassing the performance of the teacher network. Ablation studies further validate the effectiveness of each module within our proposed method. Our method overcomes the constraints of traditional distillation methods by offering instance-level adaptive learning weights tailored to PTCKD. By quantifying student-teacher feature disparity and minimizing class-wise feature probability distribution loss, our method outperforms other distillation methods.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.displa.2023.102543
Trained teacher: Who is good at teaching
  • Sep 20, 2023
  • Displays
  • Xingzhu Liang + 5 more

Trained teacher: Who is good at teaching

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant