Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic Hashing

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

This paper introduces BRCD, a knowledge distillation method tailored for unsupervised semantic hashing that aligns semantic spaces via contrastive learning, employs a cluster-based approach for robustness, and uses bit masks to reduce redundancy, significantly improving performance across various models and backbones.

Abstract
Translate article icon Translate Article Star icon

Unsupervised semantic hashing has emerged as an indispensable technique for fast image search, which aims to convert images into binary hash codes without relying on labels. Recent advancements in the field demonstrate that employing large-scale backbones (e.g., ViT) in unsupervised semantic hashing models can yield substantial improvements. However, the inference delay has become increasingly difficult to overlook. Knowledge distillation provides a means for practical model compression to alleviate this delay. Nevertheless, the prevailing knowledge distillation approaches are not explicitly designed for semantic hashing. They ignore the unique search paradigm of semantic hashing, the inherent necessities of the distillation process, and the property of hash codes. In this paper, we propose an innovative Bit-mask Robust Contrastive knowledge Distillation (BRCD) method, specifically devised for the distillation of semantic hashing models. To ensure the effectiveness of two kinds of search paradigms in the context of semantic hashing, BRCD first aligns the semantic spaces between the teacher and student models through a contrastive knowledge distillation objective. Additionally, to eliminate noisy augmentations and ensure robust optimization, a cluster-based method within the knowledge distillation process is introduced. Furthermore, through a bit-level analysis, we uncover the presence of redundancy bits resulting from the bit independence property. To mitigate these effects, we introduce a bit mask mechanism in our knowledge distillation objective. Finally, extensive experiments not only showcase the noteworthy performance of our BRCD method in comparison to other knowledge distillation methods but also substantiate the generality of our methods across diverse semantic hashing models and backbones. The code for BRCD is available at https://github.com/hly1998/BRCD.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.dsp.2024.104512
Discretization and decoupled knowledge distillation for arbitrary oriented object detection
  • Apr 17, 2024
  • Digital Signal Processing
  • Cheng Chen + 2 more

Discretization and decoupled knowledge distillation for arbitrary oriented object detection

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.asoc.2024.111579
PURF: Improving teacher representations by imposing smoothness constraints for knowledge distillation
  • Apr 9, 2024
  • Applied Soft Computing
  • Md Imtiaz Hossain + 3 more

PURF: Improving teacher representations by imposing smoothness constraints for knowledge distillation

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.csl.2023.101583
Dual Knowledge Distillation for neural machine translation
  • Nov 9, 2023
  • Computer Speech & Language
  • Yuxian Wan + 4 more

Dual Knowledge Distillation for neural machine translation

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/yac57282.2022.10023833
Spatial-temporal consistency knowledge distillation for real-time semantic segmentation
  • Nov 19, 2022
  • Dongli Wang + 3 more

Real-time semantic segmentation is a key research topic in the application field of artificial intelligence such as automatic driving and intelligent robot. At present, the consumption of storage space and computing resources of the real-time semantic segmentation model is still huge. As an efficient model compression method, knowledge distillation is widely used in various fields of computer vision. In this paper, we propose a novel knowledge distillation framework based on generative adversarial network structure, which combines spatial consistency and temporal consistency. The teacher network in this framework jointly uses the CNN branch and transformer branch to improve the spatial consistency of lightweight real-time semantic segmentation of the student network. In addition, we integrate the inter-frame relationship obtained by the optical flow network and semantic segmentation network in continuous time as the time consistency constraint of the student network. Finally, spatial consistency and temporal consistency are coupled as spatial-temporal consistency knowledge. The main purpose of our knowledge distillation method is to transfer the spatio-temporal consistency knowledge contained by teachers to students. The student network obtained by knowledge distillation can process each frame independently in the inference stage, and our knowledge distillation method does not participate in the inference process of the student network, so it will not increase the computational cost of the student network in the inference process, but it can narrow the performance gap of real-time semantic segmentation between large model and compact model. Using our method, we can get a high-performance and efficient lightweight model. Finally, we verify the effectiveness of our proposed method on the Camvid dataset and the Cityscapes dataset.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 48
  • 10.1007/s11263-023-01792-z
Multi-target Knowledge Distillation via Student Self-reflection
  • Apr 25, 2023
  • International Journal of Computer Vision
  • Jianping Gou + 5 more

Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge distillation methods mainly adapt an unidirectional knowledge transfer, where the knowledge extracted from different intermedicate layers of the teacher model is used to guide the student model. However, it turns out that the students can learn more effectively through multi-stage learning with a self-reflection in the real-world education scenario, which is nevertheless ignored by current knowledge distillation methods. Inspired by this, we devise a new knowledge distillation framework entitled multi-target knowledge distillation via student self-reflection or MTKD-SSR, which can not only enhance the teacher’s ability in unfolding the knowledge to be distilled, but also improve the student’s capacity of digesting the knowledge. Specifically, the proposed framework consists of three target knowledge distillation mechanisms: a stage-wise channel distillation (SCD), a stage-wise response distillation (SRD), and a cross-stage review distillation (CRD), where SCD and SRD transfer feature-based knowledge (i.e., channel features) and response-based knowledge (i.e., logits) at different stages, respectively; and CRD encourages the student model to conduct self-reflective learning after each stage by a self-distillation of the response-based knowledge. Experimental results on five popular visual recognition datasets, CIFAR-100, Market-1501, CUB200-2011, ImageNet, and Pascal VOC, demonstrate that the proposed framework significantly outperforms recent state-of-the-art knowledge distillation methods.

  • Research Article
  • 10.1109/tmm.2026.3651026
CLIP-SD: CLIP-Enhanced Self-Distillation for Visual Recognition
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Xixi Wang + 3 more

Current knowledge distillation methods typically require significant computational resources and time to train task-specific teacher candidates from scratch and identify the optimal teacher. Although self-distillation methods eliminate the dependency on the teacher by allowing the student model to learn independently, they face two challenges: the student learns correct and incorrect knowledge indiscriminately, and the student's learning scope is limited due to the lack of external teacher supervision. Spurred by these deficiencies, this work proposes a CLIP-enhanced Self-Distillation (CLIP-SD) method to overcome these problems, while almost not increasing training time. CLIP-SD comprises two main components: Prediction-oriented Self-Distillation (PSD) and Two-stage Task-guided CLIP Distillation (TTCD). PSD tackles the first challenge by assigning higher and lower weights to correct and incorrect prediction samples, respectively, during self-distillation. This component forces the student to focus on correct knowledge and minimize the impact of incorrect knowledge. Regarding the second challenge, the robust CLIP model is directly introduced into self-distillation. However, CLIP lacks task-specific knowledge and its output is overly smooth during the distillation process, prohibiting the student from learning more effectively. Therefore, TTCD refines CLIP's output through a two-stage process, endowing it with task-specific knowledge to enhance student learning. Experimental results indicate that CLIP-SD significantly improves distillation performance while maintaining training efficiency comparable to self-distillation. Specifically, on the CIFAR-100 dataset, the performance of CLIP-SD reaches 72.48% when trained with ResNet20 as the student model, which is an average improvement of 2.54% and 1.12% over the knowledge distillation and self-distillation methods. Regarding training time, CLIP-SD takes 3.91 hours, an average decrease of 2.73 hours compared to knowledge distillation and an average increase of 0.45 hours compared to self-distillation. Despite the slight increase in training time compared to self-distillation, the overhead is worthwhile and negligible considering its performance improvement.

  • PDF Download Icon
  • Research Article
  • 10.3390/electronics13204102
Multiloss Joint Gradient Control Knowledge Distillation for Image Classification
  • Oct 17, 2024
  • Electronics
  • Wei He + 6 more

Knowledge distillation (KD) techniques aim to transfer knowledge from complex teacher neural networks to simpler student networks. In this study, we propose a novel knowledge distillation method called Multiloss Joint Gradient Control Knowledge Distillation (MJKD), which functions by effectively combining feature- and logit-based knowledge distillation methods with gradient control. The proposed knowledge distillation method discretely considers the gradients of the task loss (cross-entropy loss), feature distillation loss, and logit distillation loss. The experimental results suggest that logits may contain more information and should, consequently, be assigned greater weight during the gradient update process in this work. The empirical findings on the CIFAR-100 and Tiny-ImageNet datasets indicate that MJKD generally outperforms traditional knowledge distillation methods, significantly enhancing the generalization ability and classification accuracy of student networks. For instance, MJKD achieves a 63.53% accuracy on Tiny-ImageNet for the ResNet18 MobileNetV2 pair. Furthermore, we present visualizations and analyses to explore its potential working mechanisms.

  • Research Article
  • Cite Count Icon 128
  • 10.1016/j.media.2022.102693
SSD-KD: A self-supervised diverse knowledge distillation method for lightweight skin lesion classification using dermoscopic images.
  • Feb 1, 2023
  • Medical Image Analysis
  • Yongwei Wang + 5 more

SSD-KD: A self-supervised diverse knowledge distillation method for lightweight skin lesion classification using dermoscopic images.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.knosys.2024.111911
Maximizing discrimination capability of knowledge distillation with energy function
  • May 8, 2024
  • Knowledge-Based Systems
  • Seonghak Kim + 4 more

Maximizing discrimination capability of knowledge distillation with energy function

  • Research Article
  • 10.1049/cvi2.12288
Knowledge distillation of face recognition via attention cosine similarity review
  • May 31, 2024
  • IET Computer Vision
  • Zhuo Wang + 2 more

Deep learning‐based face recognition models have demonstrated remarkable performance in benchmark tests, and knowledge distillation technology has been frequently accustomed to obtain high‐precision real‐time face recognition models specifically designed for mobile and embedded devices. However, in recent years, the knowledge distillation methods for face recognition, which mainly focus on feature or logit knowledge distillation techniques, neglect the attention mechanism that play an important role in the domain of neural networks. An innovation cross‐stage connection review path of the attention cosine similarity knowledge distillation method that unites the attention mechanism with review knowledge distillation method is proposed. This method transfers the attention map obtained from the teacher network to the student through a cross‐stage connection path. The efficacy and excellence of the proposed algorithm are demonstrated in popular benchmark tests.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.neucom.2024.127516
Multi-perspective analysis on data augmentation in knowledge distillation
  • Mar 5, 2024
  • Neurocomputing
  • Wei Li + 3 more

Multi-perspective analysis on data augmentation in knowledge distillation

  • Research Article
  • Cite Count Icon 11
  • 10.1108/ijwis-10-2023-0192
Efficient knowledge distillation for remote sensing image classification: a CNN-based approach
  • Dec 14, 2023
  • International Journal of Web Information Systems
  • Huaxiang Song + 2 more

PurposeThe paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.Design/methodology/approachThis study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.FindingsThis study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.Originality/valueThis study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

  • Book Chapter
  • Cite Count Icon 18
  • 10.1007/978-3-031-20077-9_19
HEAD: HEtero-Assists Distillation for Heterogeneous Object Detectors
  • Jan 1, 2022
  • Luting Wang + 7 more

Conventional knowledge distillation (KD) methods for object detection mainly concentrate on homogeneous teacher-student detectors. However, the design of a lightweight detector for deployment is often significantly different from a high-capacity detector. Thus, we investigate KD among heterogeneous teacher-student pairs for a wide application. We observe that the core difficulty for heterogeneous KD (hetero-KD) is the significant semantic gap between the backbone features of heterogeneous detectors due to the different optimization manners. Conventional homogeneous KD (homo-KD) methods suffer from such a gap and are hard to directly obtain satisfactory performance for hetero-KD. In this paper, we propose the HEtero-Assists Distillation (HEAD) framework, leveraging heterogeneous detection heads as assistants to guide the optimization of the student detector to reduce this gap. In HEAD, the assistant is an additional detection head with the architecture homogeneous to the teacher head attached to the student backbone. Thus, a hetero-KD is transformed into a homo-KD, allowing efficient knowledge transfer from the teacher to the student. Moreover, we extend HEAD into a Teacher-Free HEAD (TF-HEAD) framework when a well-trained teacher detector is unavailable. Our method has achieved significant improvement compared to current detection KD methods. For example, on the MS-COCO dataset, TF-HEAD helps R18 RetinaNet achieve 33.9 mAP ( $$+2.2$$ ), while HEAD further pushes the limit to 36.2 mAP ( $$+4.5$$ ).

  • Research Article
  • Cite Count Icon 2
  • 10.3390/informatics13010015
Sensor-Drift Compensation in Electronic-Nose-Based Gas Recognition Using Knowledge Distillation
  • Jan 20, 2026
  • Informatics
  • Juntao Lin + 1 more

Environmental changes and sensor aging can cause sensor drift in sensor array responses (i.e., a shift in the measured signal/feature distribution over time), which in turn degrades gas classification performance in real-world deployments of electronic-nose systems. Previous studies using the UCI Gas Sensor Array Drift Dataset as a benchmark reported promising drift compensation results but often lacked robust statistical validation and may overcompensate for drift by suppressing class-discriminative variance. To address these limitations and rigorously evaluate improvements in sensor-drift compensation, we designed two domain adaptation tasks based on the UCI electronic-nose dataset: (1) using the first batch to predict remaining batches, simulating a controlled laboratory setting, and (2) using Batches 1 through n−1 to predict Batch n, simulating continuous training data updates for online training. Then, we systematically tested three methods—our semi-supervised knowledge distillation method (KD) for sensor-drift compensation; a previously benchmarked method, Domain-Regularized Component Analysis (DRCA); and a hybrid method, KD–DRCA—across 30 random test-set partitions on the UCI dataset. We showed that semi-supervised KD consistently outperformed both DRCA and KD–DRCA, achieving up to 18% and 15% relative improvements in accuracy and F1-score, respectively, over the baseline, proving KD’s superior effectiveness in electronic-nose drift compensation. This work provides a rigorous statistical validation of KD for electronic-nose drift compensation under long-term temporal drift, with repeated randomized evaluation and significance testing, and demonstrates consistent improvements over DRCA on the UCI drift benchmark.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/icpr48806.2021.9411995
Efficient Online Subclass Knowledge Distillation for Image Classification
  • Jan 10, 2021
  • Maria Tzelepi + 2 more

Deploying state-of-the-art deep learning models on embedded systems dictates certain storage and computation limitations. During the recent few years Knowledge Distillation (KD) has been recognized as a prominent approach to address this issue. That is, KD has been effectively proposed for training fast and compact deep learning models by transferring knowledge from more complex and powerful models. However, knowledge distillation, in its conventional form, involves multiple stages of training, rendering it a computationally and memory demanding procedure. In this paper, a novel single-stage self knowledge distillation method is proposed, namely Online Subclass Knowledge Distillation (OSKD), that aims at revealing the similarities inside classes, so as to improve the performance of any deep neural model in an online manner. Hence, as opposed to existing online distillation methods, we are able to acquire further knowledge from the model itself, without building multiple identical models or using multiple models to teach each other, rendering the proposed OSKD approach more efficient. The experimental evaluation on two datasets validates that the proposed method improves the classification performance.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant