Multi-perspective analysis on data augmentation in knowledge distillation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Multi-perspective analysis on data augmentation in knowledge distillation

Similar Papers
  • Book Chapter
  • Cite Count Icon 9
  • 10.1007/978-3-031-26284-5_31
What Role Does Data Augmentation Play in Knowledge Distillation?
  • Jan 1, 2023
  • Wei Li + 5 more

Knowledge distillation is an effective way to transfer knowledge from a large model to a small model, which can significantly improve the performance of the small model. In recent years, some contrastive learning-based knowledge distillation methods (i.e., SSKD and HSAKD) have achieved excellent performance by utilizing data augmentation. However, the worth of data augmentation has always been overlooked by researchers in knowledge distillation, and no work analyzes its role in particular detail. To fix this gap, we analyze the effect of data augmentation on knowledge distillation from a multi-sided perspective. In particular, we demonstrate the following properties of data augmentation: (a) data augmentation can effectively help knowledge distillation work even if the teacher model does not have the information about augmented samples, and our proposed diverse and rich Joint Data Augmentation (JDA) is more valid than single rotating in knowledge distillation; (b) using diverse and rich augmented samples to assist the teacher model in training can improve its performance, but not the performance of the student model; (c) the student model can achieve excellent performance when the proportion of augmented samples is within a suitable range; (d) data augmentation enables knowledge distillation to work better in a few-shot scenario; (e) data augmentation is seamlessly compatible with some knowledge distillation methods and can potentially further improve their performance. Enlightened by the above analysis, we propose a method named Cosine Confidence Distillation (CCD) to transfer the augmented samples’ knowledge more reasonably. And CCD achieves better performance than the latest SOTA HSAKD with fewer storage requirements on CIFAR-100 and ImageNet-1k. Our code is released at https://github.com/liwei-group/CCD.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.dsp.2024.104512
Discretization and decoupled knowledge distillation for arbitrary oriented object detection
  • Apr 17, 2024
  • Digital Signal Processing
  • Cheng Chen + 2 more

Discretization and decoupled knowledge distillation for arbitrary oriented object detection

  • Research Article
  • Cite Count Icon 4
  • 10.1109/tip.2024.3445740
Relation Knowledge Distillation by Auxiliary Learning for Object Detection.
  • Jan 1, 2024
  • IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
  • Hao Wang + 3 more

Balancing the trade-off between accuracy and speed for obtaining higher performance without sacrificing the inference time is a challenging topic for object detection task. Knowledge distillation, which serves as a kind of model compression techniques, provides a potential and feasible way to handle above efficiency and effectiveness issue through transferring the dark knowledge from the sophisticated teacher detector to the simple student one. Despite demonstrating promising solutions to make harmonies between accuracy and speed, current knowledge distillation for object detection methods still suffer from two limitations. Firstly, most of the methods are inherited or refereed from the frameworks in image classification task, and deploy an implicit manner by imitating or constraining the features from the intermediate layers or the output predictions between the teacher and student models. While little consideration has been raised to the intrinsic relevance of the classification and localization predictions in object detection task. Besides, these methods fail to investigate the relationship between detection and distillation tasks in knowledge distillation pipeline, and they train the whole network by simply integrating losses from these two different tasks through hand-crafted designation parameters. For addressing the aforementioned issues, we propose a novel Relation Knowledge Distillation by Auxiliary Learning for Object Detection (ReAL) method in this paper. Specifically, we first design a prediction relation distillation module which makes the student model directly mimic the output predictions from the teacher one, and conduct self and mutual relation distillation losses to excavate the relation information between teacher and student models. Moreover, for better devolving into the relationship between different tasks in distillation pipeline, we introduce the auxiliary learning into knowledge distillation for object detection and develop a dynamic weight adaptation strategy. Through regarding detection task as primary task and treating distillation task as auxiliary task in auxiliary learning framework, we dynamically adjust and regularize the corresponding weights of the losses for these tasks during the training process. Experiments on MS COCO dataset are conducted using various detector combinations of teacher and student models and the results show that our proposed ReAL can achieve obvious improvement on different distillation model configurations, while performing favorably against state-of-the-arts.

  • Research Article
  • 10.62051/ijcsit.v4n3.36
A Review of the Lightweight Technology of Object Detection Algorithms
  • Dec 21, 2024
  • International Journal of Computer Science and Information Technology
  • Zexi Tan

Deep learning has made significant progress in the field of object detection, especially convolutional neural networks have performed well in image classification, object detection, and segmentation tasks. However, with the increasing complexity of models and the demand for computing resources, traditional deep learning models face challenges in the deployment of resource-constrained mobile and embedded devices. In order to solve this problem, model compression and acceleration techniques have become a research hotspot, including pruning, quantification and knowledge distillation. The purpose of this paper is to review various algorithms in the field of object detection and their advantages and disadvantages, and to discuss the best optimization scheme based on the application and optimization effect of lightweight technology in various algorithms. The research objectives include: systematically summarizing and analyzing the main lightweight technologies currently used for object detection algorithms, evaluating their practical effects in object detection tasks, proposing improvement schemes suitable for specific application scenarios, and looking forward to the future development direction, and discussing potential research directions and technological breakthroughs.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.asoc.2024.111579
PURF: Improving teacher representations by imposing smoothness constraints for knowledge distillation
  • Apr 9, 2024
  • Applied Soft Computing
  • Md Imtiaz Hossain + 3 more

PURF: Improving teacher representations by imposing smoothness constraints for knowledge distillation

  • Research Article
  • Cite Count Icon 53
  • 10.1016/j.knosys.2022.108136
Multi-level knowledge distillation for low-resolution object detection and facial expression recognition
  • Jan 10, 2022
  • Knowledge-Based Systems
  • Tingsong Ma + 2 more

Multi-level knowledge distillation for low-resolution object detection and facial expression recognition

  • Research Article
  • 10.1049/csy2.70002
Big2Small: Learning from masked image modelling with heterogeneous self‐supervised knowledge distillation
  • Dec 1, 2024
  • IET Cyber-Systems and Robotics
  • Ziming Wang + 5 more

Small convolutional neural network (CNN)‐based models usually require transferring knowledge from a large model before they are deployed in computationally resource‐limited edge devices. Masked image modelling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the transformer‐based large model and the CNN‐based small network. In this paper, the authors develop the first heterogeneous self‐supervised knowledge distillation (HSKD) based on MIM, which can efficiently transfer knowledge from large transformer models to small CNN‐based models in a self‐supervised fashion. Our method builds a bridge between transformer‐based models and CNNs by training a UNet‐style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modelling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre‐trained using advanced self‐supervised methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state‐of‐the‐art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, HSKD improves the accuracy of Resnet‐50 (sparse) from 76.98% to 80.01%.

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.knosys.2024.111911
Maximizing discrimination capability of knowledge distillation with energy function
  • May 8, 2024
  • Knowledge-Based Systems
  • Seonghak Kim + 4 more

Maximizing discrimination capability of knowledge distillation with energy function

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1088/1742-6596/2171/1/012058
Attention Based Data Augmentation for Knowledge Distillation with Few Data
  • Jan 1, 2022
  • Journal of Physics: Conference Series
  • Shengzhao Tian + 1 more

Knowledge distillation has attracted great attentions from computer vision researchers in recent years. However, the performance of student model will suffer from the absence of the complete dataset, which is used to train the teacher model. Especially for conducting knowledge distillation between heterogeneous models, it is difficult for student model to learn and receive guidance with few data. In this paper, a data augmentation method is proposed based on the attentional response of teacher model. The proposed method utilizes the knowledge in teacher model without requiring homogeneous architecture between teacher model and student model. Experimental results demonstrate that combining the proposed data augmentation method with different knowledge distillation methods, the performance of student model can be improved in knowledge distillation with few data.

  • Conference Article
  • Cite Count Icon 4
  • 10.1145/3589334.3645440
Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic Hashing
  • May 13, 2024
  • Liyang He + 6 more

Unsupervised semantic hashing has emerged as an indispensable technique for fast image search, which aims to convert images into binary hash codes without relying on labels. Recent advancements in the field demonstrate that employing large-scale backbones (e.g., ViT) in unsupervised semantic hashing models can yield substantial improvements. However, the inference delay has become increasingly difficult to overlook. Knowledge distillation provides a means for practical model compression to alleviate this delay. Nevertheless, the prevailing knowledge distillation approaches are not explicitly designed for semantic hashing. They ignore the unique search paradigm of semantic hashing, the inherent necessities of the distillation process, and the property of hash codes. In this paper, we propose an innovative Bit-mask Robust Contrastive knowledge Distillation (BRCD) method, specifically devised for the distillation of semantic hashing models. To ensure the effectiveness of two kinds of search paradigms in the context of semantic hashing, BRCD first aligns the semantic spaces between the teacher and student models through a contrastive knowledge distillation objective. Additionally, to eliminate noisy augmentations and ensure robust optimization, a cluster-based method within the knowledge distillation process is introduced. Furthermore, through a bit-level analysis, we uncover the presence of redundancy bits resulting from the bit independence property. To mitigate these effects, we introduce a bit mask mechanism in our knowledge distillation objective. Finally, extensive experiments not only showcase the noteworthy performance of our BRCD method in comparison to other knowledge distillation methods but also substantiate the generality of our methods across diverse semantic hashing models and backbones. The code for BRCD is available at https://github.com/hly1998/BRCD.

  • Research Article
  • 10.11834/jig.210337
高层语义分析中的模型蒸馏方法综述
  • Jan 1, 2023
  • Journal of Image and Graphics
  • Sun Ruoyu + 1 more

计算机视觉的任务目标是建立接近人类视觉系统的计算模型。随着深度神经网络(deep neural network,DNN)的发展,对计算机视觉中高层语义的分析与理解成为研究重点。计算机视觉的高层语义通常为人类可理解、可表述的用于表达图像、视频等媒体信号内容的描述子(descriptor),典型的高层语义分析任务包含图像分类、目标检测、实例分割、语义分割与视频场景识别、目标跟踪等。基于深度神经网络的算法使计算机视觉任务获得逐步提升的性能,但是网络模型的体量增大与计算效率的降低随之而来。模型蒸馏是一种基于迁移学习进行模型压缩的方案。此类方案通常利用一个预训练模型作为教师,提取其有效的表示,如模型输出、隐藏层特征或特征间相似度等,并将上述表示作为另一个规模较小、推断速度较快的学生模型的额外监督信号,对该学生模型进行训练,以达到提升小模型性能从而取代大模型的目的。模型蒸馏对模型性能与计算复杂度有着良好权衡,因此愈来愈多地用于基于深度学习的高层语义分析中。自2014年模型蒸馏概念提出以来,研究人员开发了大量应用于高层语义分析的模型蒸馏方法,在图像分类、目标检测与语义分割任务中的应用最为广泛。本文对上述典型任务中具有代表性的模型蒸馏方案进行调研和汇总,依照不同的视觉任务进行介绍。首先,从最成熟、应用最广泛的分类任务模型蒸馏方法开始,介绍其不同的设计思路与应用场景,展示部分实验性能的对比,指出在分类任务上与在检测、分割任务上应用模型蒸馏的条件差异性。接着,对几种经特殊设计而应用于目标检测、语义分割的典型模型蒸馏方法进行介绍,结合模型结构对设计目的与思路进行说明,提供部分实验结果的对比与分析。最后,对当前高层语义分析中模型蒸馏方法的现状进行了总结分析,并指出存在的困难及不足,设想未来可能的探索思路与发展方向。

  • Front Matter
  • Cite Count Icon 18
  • 10.1016/j.esmoop.2022.100429
Area under the curve may hide poor generalisation to external datasets
  • Apr 1, 2022
  • ESMO Open
  • A Kleppe

Area under the curve may hide poor generalisation to external datasets

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.csl.2023.101583
Dual Knowledge Distillation for neural machine translation
  • Nov 9, 2023
  • Computer Speech & Language
  • Yuxian Wan + 4 more

Dual Knowledge Distillation for neural machine translation

  • Book Chapter
  • Cite Count Icon 385
  • 10.1007/978-3-030-01240-3_21
DetNet: Design Backbone for Object Detection
  • Jan 1, 2018
  • Zeming Li + 5 more

Recent CNN based object detectors, either one-stage methods like YOLO, SSD, and RetinaNet, or two-stage detectors like Faster R-CNN, R-FCN and FPN, are usually trying to directly finetune from ImageNet pre-trained models designed for the task of image classification. However, there has been little work discussing the backbone feature extractor specifically designed for the task of object detection. More importantly, there are several differences between the tasks of image classification and object detection. (i) Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. (ii) Object detection not only needs to recognize the category of the object instances but also spatially locate them. Large downsampling factors bring large valid receptive field, which is good for image classification, but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet (4.8G FLOPs) backbone. Codes will be released (https://github.com/zengarden/DetNet).

  • Research Article
  • Cite Count Icon 32
  • 10.1109/tip.2021.3101158
Resolution-Aware Knowledge Distillation for Efficient Inference.
  • Jan 1, 2021
  • IEEE Transactions on Image Processing
  • Zhanxiang Feng + 2 more

Minimizing the computation complexity is essential for the popularization of deep networks in practical applications. Nowadays, most researches attempt to accelerate deep networks by designing new network structure or compressing the network parameters. Meanwhile, transfer learning techniques such as knowledge distillation are utilized to keep the performance of deep models. In this paper, we focus on accelerating deep models and relieving the computation burden by using low-resolution (LR) images as inputs while maintaining competitive performance, which is rarely researched in the current literature. Deep networks may encounter serious performance degradation when using LR inputs because many details are unavailable from LR images. Besides, the existing approaches may fail to learn discriminative features for LR images because of the dramatic appearance variations between LR and high-resolution (HR) images. To tackle with the above problems, we propose a resolution-aware knowledge distillation (RKD) framework to narrow the cross-resolution variations by transferring knowledge from HR domain to LR domain. The proposed framework consists of a HR teacher network and a LR student network. First, we introduce a discriminator and propose an adversarial learning strategy to shrink the variations between inputs with changing resolution. Then we design a cross-resolution knowledge distillation (CRKD) loss to train discriminative student network by exploiting the knowledge of the teacher network. The CRKD loss is consisted of a resolution-aware distillation loss, a pair-wise constraint, and a maximum mean discrepancy loss. Experimental results on person re-identification, image classification, face recognition, and defect segmentation tasks demonstrate that RKD outperforms traditional knowledge distillation method by achieving better performance with lower computation complexities. Furthermore, CRKD surpasses the state-of-the-art knowledge distillation methods in transferring knowledge across different resolutions under RKD framework, especially when coping with large resolution differences.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant