Knowledge Distillation Research Articles

Benefiting from large well-trained deep neural networks (DNNs), model compression has captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been empirically noticed that the backdoor in the teacher model will be transferred to the student model during the process of KD. Although numerous KD methods have been proposed, most of them focus on the distillation of a high-performing student model without robustness consideration. Besides, some research adopts KD techniques as effective backdoor mitigation tools, but they fail to perform model compression at the same time. Consequently, it is still an open problem to well achieve two objectives of robust KD, i.e., student model’s performance and backdoor mitigation. To address these issues, we propose RobustKD, a robust knowledge distillation that compresses the model while mitigating backdoor based on feature variance. Specifically, RobustKD distinguishes the previous works in three key aspects: (1) effectiveness - by distilling the feature map of the teacher model after detoxification, the main task performance of the student model is comparable to that of the teacher model; (2) robustness - by reducing the characteristic variance between the teacher model and the student model, it mitigates the backdoor of the student model under backdoored teacher model scenario; (3) generic - RobustKD still has good performance in the face of multiple data models (e.g., WRN 28-4, Pyramid-200) and diverse DNNs (e.g., ResNet50, MobileNet). Comprehensive experiments are conducted on four datasets, six models, two distillation methods, and two backdoor attack methods, compared with four baselines, and the results verified that the proposed method achieves the state-of-the-art performance in both aspects of accuracy and robustness. In addition, RobustKD is still effective when adaptive attacks are considered. The code of RobustKD is open-sourced at https://github.com/Xming-Z/RobustKD.

Background and ObjectiveRecent advancements in brain-computer interface (BCI) technology have seen a significant shift towards incorporating complex decoding models such as deep neural networks (DNNs) to enhance performance. These models are particularly crucial for sophisticated tasks such as regression for decoding arbitrary movements. However, these BCI models trained and tested on individual data often face challenges with limited performance and generalizability across different subjects. This limitation is primarily due to a tremendous number of parameters of DNN models. Training complex models demands extensive datasets. Nevertheless, group data from many subjects may not produce sufficient decoding performance because of inherent variability in neural signals both across individuals and over time MethodsTo address these challenges, this study proposed a transfer learning approach that could effectively adapt to subject-specific variability in cortical regions. Our method involved training two separate movement decoding models: one on individual data and another on pooled group data. We then created a salience map for each cortical region from the individual model, which helped us identify the input's contribution variance across subjects. Based on the contribution variance, we combined individual and group models using a modified knowledge distillation framework. This approach allowed the group model to be universally applicable by assigning greater weights to input data, while the individual model was fine-tuned to focus on areas with significant individual variance ResultsOur combined model effectively encapsulated individual variability. We validated this approach with nine subjects performing arm-reaching tasks, with our method outperforming (mean correlation coefficient, r = 0.75) both individual (r = 0.70) and group models (r = 0.40) in decoding performance. In particular, there were notable improvements in cases where individual models showed low performances (e.g., r = 0.50 in the individual decoder to r = 0.61 in the proposed decoder) ConclusionsThese results not only demonstrate the potential of our method for robust BCI, but also underscore its ability to generalize individual data for broader applicability.

Knowledge Distillation Research Articles

Related Topics

Articles published on Knowledge Distillation

Injecting the score of the first-stage retriever as text improves BERT-based re-rankers

Efficient and Lightweight Neural Network for Hard Hat Detection

FedGK: Communication-Efficient Federated Learning through Group-Guided Knowledge Distillation

UAWC: An intelligent underwater acoustic target recognition system for working conditions mismatching

An Optimization Method for Lightweight Rock Classification Models: Transferred Rich Fine-Grained Knowledge.

Robust knowledge distillation based on feature variance against backdoored teacher model

A Forest Fire Smoke Monitoring System Based on a Lightweight Neural Network for Edge Devices

Class-incremental learning with Balanced Embedding Discrimination Maximization

SeDPGK: Semi-supervised software defect prediction with graph representation learning and knowledge distillation

Artistic Style Transfer Based on Attention with Knowledge Distillation

Enhancing accident diagnosis in nuclear power plants through knowledge Distillation: Bridging the gap between simulation and Real-World scenarios

Self-architectural knowledge distillation for spiking neural networks

Deep Learning Model Compression Techniques Performance on Edge Devices

A lightweight deep learning model with knowledge distillation for pulmonary diseases detection in chest X-rays

G-MBRMD: Lightweight liver segmentation model based on guided teaching with multi-head boundary reconstruction mapping distillation

KDGAN: Knowledge distillation‐based model copyright protection for secure and communication‐efficient model publishing

Dual model transfer learning to compensate for individual variability in brain-computer interface

Open-category referring expression comprehension via multi-modal knowledge transfer

A knowledge distillation strategy for enhancing the adversarial robustness of lightweight automatic modulation classification models

Projected Latent Distillation for Data-Agnostic Consolidation in distributed continual learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Knowledge Distillation Research Articles

Related Topics

Articles published on Knowledge Distillation

Injecting the score of the first-stage retriever as text improves BERT-based re-rankers

Efficient and Lightweight Neural Network for Hard Hat Detection

FedGK: Communication-Efficient Federated Learning through Group-Guided Knowledge Distillation

UAWC: An intelligent underwater acoustic target recognition system for working conditions mismatching

An Optimization Method for Lightweight Rock Classification Models: Transferred Rich Fine-Grained Knowledge.

Robust knowledge distillation based on feature variance against backdoored teacher model

A Forest Fire Smoke Monitoring System Based on a Lightweight Neural Network for Edge Devices

Class-incremental learning with Balanced Embedding Discrimination Maximization

SeDPGK: Semi-supervised software defect prediction with graph representation learning and knowledge distillation

Artistic Style Transfer Based on Attention with Knowledge Distillation

Enhancing accident diagnosis in nuclear power plants through knowledge Distillation: Bridging the gap between simulation and Real-World scenarios

Self-architectural knowledge distillation for spiking neural networks

Deep Learning Model Compression Techniques Performance on Edge Devices

A lightweight deep learning model with knowledge distillation for pulmonary diseases detection in chest X-rays

G-MBRMD: Lightweight liver segmentation model based on guided teaching with multi-head boundary reconstruction mapping distillation

KDGAN: Knowledge distillation‐based model copyright protection for secure and communication‐efficient model publishing

Dual model transfer learning to compensate for individual variability in brain-computer interface

Open-category referring expression comprehension via multi-modal knowledge transfer

A knowledge distillation strategy for enhancing the adversarial robustness of lightweight automatic modulation classification models

Projected Latent Distillation for Data-Agnostic Consolidation in distributed continual learning