Articles published on Knowledge Distillation
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
4403 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.iswa.2026.200638
- May 1, 2026
- Intelligent Systems with Applications
- Mulugeta Adibaru Kiflie
Leveraging knowledge distillation for lightweight and interpretable deep learning in Ethiopian medicinal plant classification
- New
- Research Article
- 10.1016/j.neunet.2025.108482
- May 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Changze Lv + 8 more
SpikeBERT: A language spikformer learned from BERT with knowledge distillation.
- New
- Research Article
- 10.1002/jemt.70104
- May 1, 2026
- Microscopy research and technique
- Anandh Sam Chandra Bose + 2 more
Prostate cancer is a prevalent and serious health concern, ranking among the most frequently diagnosed cancers and a leading cause of cancer-related deaths in men worldwide. Early detection and accurate diagnosis are crucial for improving patient outcomes by limiting disease progression. Histopathological image analysis remains the gold standard for prostate cancer detection; however, manual interpretation is time-consuming and requires specialized expertise. To address these challenges, this study proposes a hybrid deep learning framework that combines an ensemble of transfer-learned CNNs (VGG-16, DenseNet-121, and AlexNet) with a fine-tuned Vision Transformer (ViT). The CNN ensemble extracts rich local features, while the ViT captures global contextual dependencies through a self-attention mechanism and a multilayer perceptron (MLP). Additionally, a cross-attention fusion (CAF) module integrates local and global features, and knowledge distillation (KD) enables a lightweight student network suitable for efficient clinical deployment. The study utilizes the publicly available PANDA dataset for training and testing. Preprocessing steps, including patch generation, gamma correction, and stain deconvolution, enhance image quality and feature representation. A comprehensive evaluation was conducted using standard performance metrics such as accuracy, true positive rate (TPR), true negative rate (TNR), precision, F1-score, false negative rate (FNR), and false positive rate (FPR). An ablation study confirmed the contribution of each module, highlighting the critical role of ensemble CNNs, CAF, and ViT in improving performance. Experimental results demonstrate that the proposed model outperforms conventional transfer learning models and existing state-of-the-art techniques, achieving 97.91% accuracy, along with significant improvements in TPR, TNR, and reduced FNR/FPR. The computational complexity, evaluated in terms of parameters, FLOPs, GPU memory, and inference time, indicates that the proposed model is more demanding than traditional CNNs. Nevertheless, the architecture strikes a practical balance between predictive accuracy and efficiency, making it suitable for real-world clinical applications. These findings underscore the potential of AI-powered hybrid models in expediting prostate cancer diagnosis and enabling timely intervention for improved patient outcomes.
- New
- Research Article
- 10.1016/j.neunet.2025.108444
- May 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Yue Zhou + 3 more
Enhancing end-to-end speech translation via multi-stage knowledge distillation.
- New
- Research Article
- 10.1016/j.neunet.2025.108505
- May 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Zheng Zhongzhu + 3 more
Enhancing graph neural networks through universal self-knowledge distillation.
- New
- Research Article
- 10.1016/j.engappai.2026.114334
- May 1, 2026
- Engineering Applications of Artificial Intelligence
- Longtao Chen + 5 more
Enhancing Few-Shot marble slab surface defect detection: A diffusion framework with knowledge distillation and semantic guidance
- New
- Research Article
- 10.1109/jiot.2026.3664119
- May 1, 2026
- IEEE Internet of Things Journal
- Shengcai Zhang + 2 more
Real-time intrusion detection with millisecond response is critical for Internet of Vehicles (IoV) security but is challenged by extreme class imbalance and high computational costs. This paper proposes a novel multimodal framework integrating Denoising Diffusion Probabilistic Models (DDPM) and Knowledge Distillation (KD). First, multi-source data is transformed into RGB images. A conditional DDPM with timestep and class embeddings balances datasets by generating minority-class samples. The teacher model (DiffuGuardian) fuses text-image features for training. Subsequently, a lightweight student model, LiteSentinel, is designed employing depthwise separable convolutions and inverted residual blocks to reduce parameters. Results on three datasets demonstrate that DiffuGuardian consistently achieves around 98–100% precision, accuracy, recall, and F1-score under 5-fold evaluation, while LiteSentinel maintains approximately 95–99% across all metrics with substantially reduced complexity. DiffuGuardian reaches an inference time of 3.80ms with a model size of 0.10 MB, whereas LiteSentinel further reduces latency to 0.79ms with a size of 0.07 MB, enabling efficient edge deployment for IoV security.
- New
- Research Article
- 10.1016/j.bspc.2026.109447
- May 1, 2026
- Biomedical Signal Processing and Control
- Wajid Ali + 3 more
Enhancing cancer detection with a lightweight knowledge distillation approach for Multi-Class image classification
- New
- Research Article
1
- 10.1016/j.eswa.2026.131146
- May 1, 2026
- Expert Systems with Applications
- Muhao Xu + 7 more
Beyond feature mapping: Dual-heterogeneous knowledge distillation with mamba for industrial anomaly detection
- New
- Research Article
- 10.1016/j.media.2026.104005
- May 1, 2026
- Medical image analysis
- Xinyu Hao + 5 more
Dual selective gleason pattern-aware multiple instance learning with uncertainty regularization for grade group prediction in histopathology images.
- New
- Research Article
- 10.1016/j.patrec.2026.03.006
- May 1, 2026
- Pattern Recognition Letters
- Bo Wang + 4 more
Beyond data dependency: FedPET enables robust federated learning via data-free dual-teacher knowledge distillation
- New
- Research Article
- 10.1016/j.knosys.2026.115690
- May 1, 2026
- Knowledge-Based Systems
- Ning Li + 5 more
Context-aware knowledge distillation for anomaly detection
- New
- Research Article
- 10.1016/j.eswa.2026.131393
- May 1, 2026
- Expert Systems with Applications
- Bolei He + 3 more
D2A2: Enhancing LLM knowledge distillation efficiency and performance with difficulty-aware and adaptive distillation framework
- New
- Research Article
2
- 10.1016/j.inffus.2025.104043
- May 1, 2026
- Information Fusion
- Menggang Kou + 5 more
PMFM-kdTransformer: An enhanced multi-modal fusion architecture leveraging knowledge distillation for intra-hour solar irradiance prediction
- New
- Research Article
- 10.1016/j.isprsjprs.2026.03.019
- May 1, 2026
- ISPRS Journal of Photogrammetry and Remote Sensing
- Medhavi Mishra + 2 more
Wildfire detection and localization in aerial imagery is critical for rapid response and damage mitigation. Autonomous aerial vehicles (AAVs) enable large area monitoring but face real-time processing challenges due to limited onboard computational and sensor resources. This work introduces a cross-modal knowledge distillation framework for edge-deployed AAVs. A teacher network trained only on thermal images transfers semantic and spatial representations to an optical image based student network when trained in an offline fashion using thermal and optical image pairs. During deployment, the student uses only optical images, thus reducing reliance on multi-sensor payloads while maintaining high detection accuracy. The student model incorporates dual classification heads: an image-level head for fire-free vs. fire-impacted scenes, and a patch-level head for flame vs. no-flame discrimination. This patch-level strategy provides effective fire localization while avoiding the computational overhead of segmentation, making it practical for resource-constrained deployment. Evaluated on aerial wildfire dataset, the framework achieves 90.97% patch-level accuracy, with false alarm and missed detection rates of 8.82% and 14.78%, respectively. The lightweight student model requires only 2.99 GFLOPS with inference time of 0.004s and generates patch-level probability heatmaps for fire region localization. Unlike conventional unimodal systems, this approach leverages thermal-to-optical knowledge transfer to deliver high accuracy, low latency, and precise localization under edge-computing constraints. The code and dataset will be released at https://github.com/medh132/cmkd .
- New
- Research Article
- 10.1016/j.neucom.2026.133285
- May 1, 2026
- Neurocomputing
- Hui Zhou + 4 more
MRKD-PBCL: Multi-level region-wise knowledge distillation and prototype balanced contrastive learning for class incremental semantic segmentation
- New
- Research Article
- 10.1109/tpami.2025.3650545
- May 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Jianjian Cao + 3 more
Vision-Language Transformers (VLTs) have achieved remarkable success, yet their high computational costs remain challenging due to numerous input tokens and large model parameters. Existing VLT compression methods primarily rely on single-modality-based token pruning or coarse-grained weight pruning techniques. However, these methods face significant obstacles, such as ignoring the critical alignment of different modalities and lacking layer-wise dynamic token pruning flexibility, exhibiting inevitable performance degradation due to coarsegrained weight pruning, and struggling with the simultaneous compression of both input tokens and model parameters. To address those limitations, we propose MADTP++, a novel approach that integrates custom-made token and weight pruning processes into a unified framework, achieving superior compression in both parameter counts and computational costs. Specifically, for the token pruning process, we introduce the Multi-modality Alignment Guidance (MAG) module and the Dynamic Token Pruning (DTP) module to align semantic features across different modalities and guide the dynamic elimination of redundant tokens based on different input instances. For the weight pruning process, we propose a Hardware-aware Weight Pruning (HWP) module that leverages the Sparse Tensor Cores across diverse hardware setups to enable fine-grained parameter pruning within VLTs. To further unify token and weight pruning, we also propose a Cooperative Optimization Training Strategy that automatically allocates GFLOPs and parameter reductions per branch before pruning and employs Knowledge Distillation Constraints to facilitate joint optimization of both pruning dimensions. Extensive experiments conducted on various VLT models and datasets demonstrate that MADTP++ can significantly reduce model parameters and computational costs while maintaining competitive performance.
- New
- Research Article
- 10.1016/j.knosys.2026.115713
- May 1, 2026
- Knowledge-Based Systems
- Pengchen Liang + 4 more
Task-specific knowledge distillation from the vision foundation model for enhanced medical image segmentation
- New
- Research Article
- 10.1016/j.future.2025.108253
- May 1, 2026
- Future Generation Computer Systems
- Jiali Zheng + 3 more
CFLKD: Clustered federated learning via cross-group knowledge distillation
- New
- Research Article
- 10.1016/j.asoc.2026.114860
- May 1, 2026
- Applied Soft Computing
- Yong Ho Lee + 5 more
Knowledge distillation for super-resolution reconstruction and segmentation in forward-facing camera images