Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Enhancing graph neural networks through universal self-knowledge distillation.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Enhancing graph neural networks through universal self-knowledge distillation.

Similar Papers
  • Conference Article
  • Cite Count Icon 9
  • 10.1109/icassp49357.2023.10095504
Training Sound Event Detection with Soft Labels from Crowdsourced Annotations
  • Jun 4, 2023
  • Irene Martín-Morató + 3 more

In this paper, we study the use of soft labels to train a system for sound event detection (SED). Soft labels can result from annotations which account for human uncertainty about categories, or emerge as a natural representation of multiple opinions in annotation. Converting annotations to hard labels results in unambiguous categories for training, at the cost of losing the details about the labels distribution. This work investigates how soft labels can be used, and what benefits they bring in training a SED system. The results show that the system is capable of learning information about the activity of the sounds which is reflected in the soft labels and is able to detect sounds that are missed in the typical binary target training setup. We also release a new dataset produced through crowdsourcing, containing temporally strong labels for sound events in real-life recordings, with both soft and hard labels.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/gcce46687.2019.9015500
Knowledge Distillation Using Soft and Hard Labels and Annealing for Acoustic Model Training
  • Oct 1, 2019
  • Control theory & applications
  • Yuuki Tachioka

While larger acoustic models provide better speech recognition performance, smaller models are appropriate when computational resources are limited. Knowledge distillation is used to train small models on basis of soft labels obtained from larger models instead of hard labels obtained from reference transcriptions. In this work, we investigated two methods for using both types of labels: sequence-level distillation (SD), in which the loss function selected is related to the hard or soft labels, and sequence-level interpolation (SI), in which both loss functions are interpolated. Experiments showed that SI was consistently better than SD, and that SI with annealing performed the best.

  • Conference Article
  • Cite Count Icon 3
  • 10.24963/ijcai.2021/319
Isotonic Data Augmentation for Knowledge Distillation
  • Aug 1, 2021
  • Wanyun Cui + 1 more

Knowledge distillation uses both real hard labels and soft labels predicted by teacher model as supervision. Intuitively, we expect the soft label probabilities and hard label probabilities to be concordant. However, in the real knowledge distillations, we found critical rank violations between hard labels and soft labels for augmented samples. For example, for an augmented sample x = 0.7 * cat + 0.3 * panda, a meaningful soft label distribution should have the same rank: P(cat|x)>P(panda|x)>P(other|x). But real teacher models usually violate the rank: P(tiger|x)>P(panda|x)>P(cat|x). We attribute the rank violations to the increased difficulty of understanding augmented samples for the teacher model. Empirically, we found the violations injuries the knowledge transfer. In this paper, we denote eliminating rank violations in data augmentation for knowledge distillation as isotonic data augmentation (IDA). We use isotonic regression (IR) -- a classic statistical algorithm -- to eliminate the rank violations. We show that IDA can be modeled as a tree-structured IR problem and gives an O(c*log(c)) optimal algorithm, where c is the number of labels. In order to further reduce the time complexity of the optimal algorithm, we also proposed a GPU-friendly approximation algorithm with linear time complexity. We have verified on variant datasets and data augmentation baselines that (1) the rank violation is a general phenomenon for data augmentation in knowledge distillation. And (2) our proposed IDA algorithms effectively increases the accuracy of knowledge distillation by solving the ranking violations.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.ins.2022.08.057
KDCTime: Knowledge distillation with calibration on InceptionTime for time-series classification
  • Aug 18, 2022
  • Information Sciences
  • Xueyuan Gong + 5 more

KDCTime: Knowledge distillation with calibration on InceptionTime for time-series classification

  • Research Article
  • Cite Count Icon 1
  • 10.1109/tfuzz.2025.3624974
A Linguistically Interpretable Fuzzy Fault Diagnosis Model: Knowledge Distillation Perspective
  • Jan 1, 2026
  • IEEE Transactions on Fuzzy Systems
  • Meng-Wei Li + 3 more

Emerging deep fuzzy neuralnetwork-based fault diagnosis (FD)—integrating deep neural networks (DNNs) and fuzzy neural networks (FNNs) in sequential or parallel architectures—has demonstrated great potential in both high performance and model interpretability. However, in the cascade architecture, the membership functions of the FNN struggle to finely describe the complex fault features extracted by the DNN, leading to reduced FD performance. Meanwhile, the abstraction of fault features undermines the FNN's interpretability. In the parallel architecture, designing a reasonable strategy to fuse the heterogeneous features extracted by the DNN and FNN is nontrivial. To address these challenges, we propose a novel FD model called linguistically interpretable fuzzy neural network with knowledge distillation, which integrates DNN and FNN through the perspective of knowledge distillation, aiming for high performance while preserving FNN's intrinsic interpretability. The model extracts modality probability knowledge from a well-trained DNN teacher unit and feeds into the knowledge distillation unit to generate soft labels encapsulating modality similarity knowledge. These soft labels carry the teacher unit's core insights into FD and activate implicit information of negative modalities. Then, the knowledge-based Takagi–Sugeno–Kang unit uses the soft labels as consequent variables to perform linguistically interpretable rule reasoning from original features to fault modalities. The model is optimized by minimizing a composite loss comprising cross-entropy, soft label regularization, and L2 regularization, ensuring more knowledge is transferred to the distilled unit. Comprehensive evaluation across a series of industrial process cases validated the model's effectiveness in performance and interpretability.

  • Research Article
  • Cite Count Icon 52
  • 10.1109/tmm.2023.3321480
Parameter-Efficient and Student-Friendly Knowledge Distillation
  • Jan 1, 2024
  • IEEE Transactions on Multimedia
  • Jun Rao + 6 more

Pre-trained models are frequently employed in multimodal learning. However, these models have too many parameters and need too much effort to fine-tune the downstream tasks. Knowledge distillation (KD) is a method to transfer knowledge using the soft label from this pre-trained teacher model to a smaller student, where the parameters of the teacher are fixed (or partially) during training. Recent studies show that this mode may cause difficulties in knowledge transfer due to the mismatched model capacities. To alleviate the mismatch problem, adjustment of temperature parameters, label smoothing and teacher-student joint training methods (online distillation) to smooth the soft label of a teacher network, have been proposed. But those methods rarely explain the effect of smoothed soft labels to enhance the KD performance. The main contributions of our work are the discovery, analysis, and validation of the effect of the smoothed soft label and a less time-consuming and adaptive transfer of the pre-trained teacher's knowledge method, namely PESF-KD by adaptive tuning soft labels of the teacher network. Technically, we first mathematically formulate the mismatch as the sharpness gap between teacher's and student's predictive distributions, where we show such a gap can be narrowed with the appropriate smoothness of the soft label. Then, we introduce an adapter module for the teacher and only update the adapter to obtain soft labels with appropriate smoothness. Experiments on various benchmarks including CV and NLP show that PESF-KD can significantly reduce the training cost while obtaining competitive results compared to advanced online distillation methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 22
  • 10.3390/rs14184523
A Novel Knowledge Distillation Method for Self-Supervised Hyperspectral Image Classification
  • Sep 10, 2022
  • Remote Sensing
  • Qiang Chi + 3 more

Using deep learning to classify hyperspectral image(HSI) with only a few labeled samples available is a challenge. Recently, the knowledge distillation method based on soft label generation has been used to solve classification problems with a limited number of samples. Unlike normal labels, soft labels are considered the probability of a sample belonging to a certain category, and are therefore more informative for the sake of classification. The existing soft label generation methods for HSI classification cannot fully exploit the information of existing unlabeled samples. To solve this problem, we propose a novel self-supervised learning method with knowledge distillation for HSI classification, termed SSKD. The main motivation is to exploit more valuable information for classification by adaptively generating soft labels for unlabeled samples. First, similarity discrimination is performed using all unlabeled and labeled samples by considering both spatial distance and spectral distance. Then, an adaptive nearest neighbor matching strategy is performed for the generated data. Finally, probabilistic judgment for the category is performed to generate soft labels. Compared to the state-of-the-art method, our method improves the classification accuracy by 4.88%, 7.09% and 4.96% on three publicly available datasets, respectively.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 36
  • 10.1109/access.2020.3021711
Knowledge Distillation in Acoustic Scene Classification
  • Jan 1, 2020
  • IEEE Access
  • Jee-Weon Jung + 3 more

Common acoustic properties that different classes share degrades the performance of acoustic scene classification systems. This results in a phenomenon where a few confusing pairs of acoustic scenes dominate a significant proportion of all misclassified audio segments. In this article, we propose adopting a knowledge distillation framework that trains deep neural networks using soft labels. Soft labels, extracted from another pre-trained deep neural network, are used to reflect the similarity between different classes that share similar acoustic properties. We also propose utilizing specialist models to provide additional soft labels. Each specialist model in this study refers to a deep neural network that concentrates on discriminating a single pair of acoustic scenes that are frequently misclassified. Self multi-head attention is explored for training specialist deep neural networks to further concentrate on target pairs of classes. The goal of this article is to train a single deep neural network that demonstrates performance equivalent to, or higher than, an ensemble of multiple models, by distilling the knowledge from several models. Diverse experiments conducted using the detection and classification of acoustic scenes and events 2019 task 1-a dataset demonstrate that the knowledge distillation framework is effective in acoustic scene classification. Specialist models successfully decrease the number of misclassified audio segments in the target classes. The final single model with the proposed method that is trained by the proposed knowledge distillation from several models, including specialists trained using an attention mechanism, shows a classification accuracy of 77.63 %, higher than an ensemble of the baseline and multiple specialists.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/rs16203853
Instance-Level Scaling and Dynamic Margin-Alignment Knowledge Distillation for Remote Sensing Image Scene Classification
  • Oct 17, 2024
  • Remote Sensing
  • Chuan Li + 3 more

Remote sensing image (RSI) scene classification aims to identify semantic categories in RSI using neural networks. However, high-performance deep neural networks typically demand substantial storage and computational resources, making practical deployment challenging. Knowledge distillation has emerged as an effective technique for developing compact models that maintain high classification accuracy in RSI tasks. Existing knowledge distillation methods often overlook the high inter-class similarity in RSI scenes, leading to low-confidence soft labels from the teacher model, which can mislead the student model. Conversely, overly confident soft labels may discard valuable non-target information. Additionally, the significant intra-class variability in RSI contributes to instability in the model’s decision boundaries. To address these challenges, we propose an efficient method called instance-level scaling and dynamic margin-alignment knowledge distillation (ISDM) for RSI scene classification. To balance the target and non-target class influence, we apply an entropy regularization loss to scale the teacher model’s target class at the instance level. Moreover, we introduce dynamic margin alignment between the student and teacher models to improve the student’s discriminative capability. By optimizing soft labels and enhancing the student’s ability to distinguish between classes, our method reduces the effects of inter-class similarity and intra-class variability. Experimental results on three public RSI scene classification datasets (AID, UCMerced, and NWPU-RESISC) demonstrate that our method achieves state-of-the-art performance across all teacher–student pairs with lower computational costs. Additionally, we validate the generalization of our approach on general datasets, including CIFAR-100 and ImageNet-1k.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 21
  • 10.3390/rs14194813
Remote Sensing Image Scene Classification via Self-Supervised Learning and Knowledge Distillation
  • Sep 27, 2022
  • Remote Sensing
  • Yibo Zhao + 3 more

The main challenges of remote sensing image scene classification are extracting discriminative features and making full use of the training data. The current mainstream deep learning methods usually only use the hard labels of the samples, ignoring the potential soft labels and natural labels. Self-supervised learning can take full advantage of natural labels. However, it is difficult to train a self-supervised network due to the limitations of the dataset and computing resources. We propose a self-supervised knowledge distillation network (SSKDNet) to solve the aforementioned challenges. Specifically, the feature maps of the backbone are used as supervision signals, and the branch learns to restore the low-level feature maps after background masking and shuffling. The “dark knowledge” of the branch is transferred to the backbone through knowledge distillation (KD). The backbone and branch are optimized together in the KD process without independent pre-training. Moreover, we propose a feature fusion module to fuse feature maps dynamically. In general, SSKDNet can make full use of soft labels and has excellent discriminative feature extraction capabilities. Experimental results conducted on three datasets demonstrate the effectiveness of the proposed approach.

  • Addendum
  • Cite Count Icon 1
  • 10.1016/j.eswa.2023.122167
WITHDRAWN: An accurate and lightweight model for driver distraction detection via multiple teacher knowledge distillation
  • Oct 1, 2023
  • Expert Systems with Applications
  • Hong Vin Koay + 2 more

WITHDRAWN: An accurate and lightweight model for driver distraction detection via multiple teacher knowledge distillation

  • Research Article
  • Cite Count Icon 1
  • 10.1080/0952813x.2024.2396131
Towards accurate diagnosis: exploring knowledge distillation and self-attention in multimodal medical image fusion
  • Sep 4, 2024
  • Journal of Experimental & Theoretical Artificial Intelligence
  • Radhika P + 3 more

Multimodal medical image fusion aims to aggregate significant information based on the characteristics of medical images from different modalities. Existing research in image fusion faces several major limitations, including a scarcity of paired data, noisy and inconsistent modalities, a lack of contextual relationships, and suboptimal feature extraction and fusion techniques. In response to these challenges, this research proposes a novel adaptive fusion approach. Our knowledge distillation (KD) model extracts informative features from multimodal medical images using various key components. A teacher network is employed to emphasise the suitability and complexity of capturing high-level abstract features. The soft labels are utilised to transfer the knowledge between the teacher network as well as the student network. During student network training, we minimise the divergence between these soft labels. To enhance the adaptive fusion of extracted features from different modalities, we apply a self-attention mechanism. Training this self-attention mechanism minimises the loss function, encouraging attention scores to capture relevant contextual relationships between features. Additionally, a cross-modal consistency module aligns the extracted features to ensure spatial consistency and meaningful fusion. Our adaptive fusion strategy effectively combines features to enhance the diagnostic value and quality of fused images. We employ generator and discriminator architectures for synthesising fused images and distinguishing between real and generated fused images. Comprehensive analysis is conducted on the basis of diverse evaluation measures. Experimental results demonstrate improved fusion outcomes with values of 0.92, 41.58, 7.25, 0.958, 0.759, 0.947, 0.90, 7.05, 0.0726, and 76 s for SSIM, PSNR, FF, VIF, UIQI, FMI, EITF, entropy, RMSE, and execution time, respectively.

  • Research Article
  • Cite Count Icon 23
  • 10.1609/aaai.v37i4.25570
IterDE: An Iterative Knowledge Distillation Framework for Knowledge Graph Embeddings
  • Jun 26, 2023
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Jiajun Liu + 3 more

Knowledge distillation for knowledge graph embedding (KGE) aims to reduce the KGE model size to address the challenges of storage limitations and knowledge reasoning efficiency. However, current work still suffers from the performance drops when compressing a high-dimensional original KGE model to a low-dimensional distillation KGE model. Moreover, most work focuses on the reduction of inference time but ignores the time-consuming training process of distilling KGE models. In this paper, we propose IterDE, a novel knowledge distillation framework for KGEs. First, IterDE introduces an iterative distillation way and enables a KGE model to alternately be a student model and a teacher model during the iterative distillation process. Consequently, knowledge can be transferred in a smooth manner between high-dimensional teacher models and low-dimensional student models, while preserving good KGE performances. Furthermore, in order to optimize the training process, we consider that different optimization objects between hard label loss and soft label loss can affect the efficiency of training, and then we propose a soft-label weighting dynamic adjustment mechanism that can balance the inconsistency of optimization direction between hard and soft label loss by gradually increasing the weighting of soft label loss. Our experimental results demonstrate that IterDE achieves a new state-of-the-art distillation performance for KGEs compared to strong baselines on the link prediction task. Significantly, IterDE can reduce the training time by 50% on average. Finally, more exploratory experiments show that the soft-label weighting dynamic adjustment mechanism and more fine-grained iterations can improve distillation performance.

  • Research Article
  • Cite Count Icon 65
  • 10.1109/tim.2021.3091498
HS-KDNet: A Lightweight Network Based on Hierarchical-Split Block and Knowledge Distillation for Fault Diagnosis With Extremely Imbalanced Data
  • Jan 1, 2021
  • IEEE Transactions on Instrumentation and Measurement
  • Jin Deng + 5 more

Because of the cost, it is unrealistic to sample the failure state for a long time, which makes the data collected from the scenario of engineering usually extremely imbalanced. However, imbalanced training data pose a negative effect on the fault diagnosis algorithms based on the data driven. When the data are extremely imbalanced, this problem becomes more challenging. Furthermore, to reduce the deployment cost, in industrial practice, it is often required that the parameters and computation of the deployed diagnosis model should be within a certain range, which puts forward the requirement of lightweight for diagnosis model. Therefore, in this article, a novel lightweight framework for fault diagnosis with extremely imbalanced data, called HS-KDNet, is proposed. Soft labels generated by knowledge distillation can represent the similarity between categories, i.e., through to learn the soft labels, the information about all categories are considered in each update of the parameters, not only the information about the current samples. Consequently, unlike traditional data re-balancing strategies based on generating pseudo samples, we utilized knowledge distillation to suppress the adverse effects of imbalanced data for the first time. On two classical bearing datasets, the effectiveness and superiority of the proposed HS-KDNet were demonstrated, and the experimental results shown that, except for HS-KDNet, knowledge distillation can significantly inhibit the adverse effects of imbalanced data on other simple models.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.jiixd.2024.06.002
Dual defense: Combining preemptive exclusion of members and knowledge distillation to mitigate membership inference attacks
  • Jun 27, 2024
  • Journal of Information and Intelligence
  • Jun Niu + 16 more

Dual defense: Combining preemptive exclusion of members and knowledge distillation to mitigate membership inference attacks

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant