Articles published on knowledge-distillation
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
4488 Search results
Sort by Recency
- Research Article
- 10.3390/make8040100
- Apr 13, 2026
- Machine Learning and Knowledge Extraction
- Hafida Hidani + 2 more
The rapid growth of Arabic social media content requires the development of accurate and efficient methods for sentiment analysis. We propose a resource-efficient multi-task learning (MTL) framework for modern standard Arabic (MSA). The model uses a shared AraBERT encoder to jointly predict emotion, polarity, and intention. We integrate knowledge distillation (KD) from a large teacher model, self-distillation (SD) using model self-ensembling, and adversarial training (AT) as a regularization strategy. Experiments conducted on an annotated corpus of MSA tweets demonstrate that all distilled models outperform a fine-tuned multi-task baseline, and the combined KD+SD+AT configuration achieves competitive results. For instance, KD alone raised Macro F1 for emotion from 0.83 to 0.88 and for intention from 0.67 to 0.72. KD+SD+AT achieved the best intention F1 (0.76) and the highest polarity F1 (0.90). Notably, F1-scores for several minority classes show consistent improvement, particularly under KD and combined configurations. Paired t-tests confirm that several improvements, especially those obtained with KD and KD+SD+AT, are statistically significant (p<0.05). Our results indicate that distillation, combined with adversarial regularization, enables the development of smaller and more efficient Arabic sentiment models while maintaining competitive accuracy. These findings address a gap in Arabic multi-task sentiment analysis and provide a scalable, resource-efficient framework, along with empirical insights for distillation in Arabic language models.
- Research Article
- 10.54254/2977-3903/2026.32757
- Apr 13, 2026
- Advances in Engineering Innovation
- Zhi Cao
The rapid proliferation of Internet of Things (IoT) devices necessitates the deployment of lightweight Network Intrusion Detection Systems (NIDS) at the network edge. Knowledge Distillation (KD) has emerged as a prevailing paradigm to compress cumbersome deep learning models for resource-constrained environments. However, existing KD approaches—whether employing full-representation transfer or static gradient masking—indiscriminately distill both generalized attack signatures and dataset-specific environment noise. This rigid feature entanglement leads to severe negative transfer, exacerbates the catastrophic forgetting of long-tail attack categories, and dramatically degrades cross-environment generalization. To overcome these critical limitations, we propose Disentangled Representation Distillation (DRD), a novel framework that fundamentally alters the nature of transferred knowledge. DRD compels the high-capacity teacher model to decouple high-dimensional network traffic representations into two orthogonal latent sub-spaces: pure Attack Semantics and Environment Background. During the distillation phase, dataset-specific background noise is explicitly discarded, allowing the student model to exclusively inherit purified attack logic. This mechanism circumvents the computational overhead of dynamic masking while equipping the student with inherent robustness against domain shifts. Comprehensive experiments across benchmark datasets (e.g., UNSW-NB15 and CICIDS2017) demonstrate that DRD significantly improves the detection of rare, long-tail attacks, achieves unprecedented cross-dataset generalization, and maintains an ultra-lightweight footprint suitable for edge deployment.
- Research Article
- 10.3390/app16083795
- Apr 13, 2026
- Applied Sciences
- Peng Huang + 6 more
To improve detection accuracy for color-sensitive and small-target defects in steel cord ply, this paper introduces an improved YOLOv8s algorithm using multi-teacher stepwise hierarchical knowledge distillation for better adaptation across production lines. The improvements include: replacing the initial backbone convolutional layer with RGBV grouped convolution to enhance color feature extraction; substituting the SPPF module with SPPFCSPC-LSKA to improve multi-scale perception; and optimizing bounding box accuracy with the WIoU loss function. The multi-teacher distillation approach first transfers color feature learning using an RGBV-only teacher, then multi-scale feature learning with an SPPFCSPC-LSKA-only teacher. Experimental results show the improved model achieved 90.4% precision, 92.0% recall, 91.2% F1-score, and 97.2% mAP@0.5, surpassing the baseline YOLOv8s by 1.9, 2.2, 2.1, and 3.4 percentage points, respectively. The proposed model also achieves an inference time of 3.9 ms, representing a 1.0 ms reduction compared to the baseline. On a smaller dataset from another production line, single-teacher distillation increased precision, recall, F1-score, and mAP@0.5 to 84.6%, 82.0%, 83.3%, and 88.8%, respectively, albeit with an increase in inference time. The multi-teacher strategy further increased metrics to 97.5% precision, 88.8% recall, 92.9% F1-score, and 94.3% mAP@0.5, providing additional gains over single-teacher distillation while maintaining the same parameter count of 11.127 M and achieving a faster inference time of 4.1 ms on the target production line.
- Research Article
- 10.1007/s40747-026-02302-7
- Apr 13, 2026
- Complex & Intelligent Systems
- Bingchan Li + 1 more
Switch object detection based on knowledge distillation for real time edge computing
- Research Article
- 10.3390/ani16081143
- Apr 9, 2026
- Animals : an open access journal from MDPI
- Zitian Liu + 5 more
Accurate and scalable body condition scoring (BCS) is important for health monitoring and productivity management in precision livestock farming. However, manual scoring is subjective, labor-intensive, and difficult to standardize, while many automated methods are too computationally demanding for edge deployment in real farm environments. This study proposes EdgeBCS-YOLO, a lightweight object detection framework for real-time beef cattle BCS in unstructured farming scenarios. Built on YOLO11n, it combines Position-Sensitive Feature Fusion (PSFF), a Texture-Aware Star Module (TASM), an Efficient Grouped Detection Head (EGDH), and a Focal and Global Knowledge Distillation (FGD)-based distillation strategy. On a dynamic blurring dataset, EdgeBCS-YOLO achieved 90.8% precision, 82.7% recall, and 88.9% mAP@50. On the NVIDIA Jetson Orin NX Super, it achieved a model size of 3.95 MB, a system FPS of 33.35, and an average inference latency of 13.26 ms. These results suggest that it is a practical and potentially efficient solution for automated BCS on edge devices.
- Research Article
- 10.1038/s41598-026-47417-6
- Apr 9, 2026
- Scientific reports
- Bikram Pratim Bhuyan + 4 more
Chronic kidney disease (CKD) is a progressive condition affecting over 850 million people worldwide, where timely detection and accurate staging are critical to reducing morbidity, mortality, and healthcare burden. Current machine learning approaches often treat CKD prediction as a binary task, neglecting the ordered nature of disease stages, and rarely incorporate physiological constraints or calibrated probability estimates essential for clinical decision support. We propose a multi-stage CKD prediction framework that integrates ordinal classification, probability calibration, and a serum creatinine-monotonicity constraint within a knowledge distillation paradigm. A calibrated CatBoost model serves as a teacher, transferring temperature-scaled class probabilities to an ordinal neural network student augmented with an auxiliary eGFR regression task. Evaluated on a cohort of 750 patients from Al-Ramadi Teaching Hospital, our approach achieved superior macro-F1 and reduced expected calibration error. This work demonstrates that combining ordinal learning, calibration, and physiological constraints yields models that are not only accurate but also aligned with clinical reasoning, offering a pathway toward safer and more trustworthy AI tools for CKD management. Limitations like circular reasoning and target leakage are also discussed.
- Research Article
- 10.3390/math14081249
- Apr 9, 2026
- Mathematics
- Qian Sheng + 3 more
The utilization of machine learning models is extensive in a wide array of significant applications. However, their vulnerability to security and privacy attacks is a serious concern, for example, for the protection of financially sensitive data such as account flow. Particularly troubling is the threat of membership inference, which enables attackers to determine whether a given data sample is included in the training set of a targeted machine-learning model. Existing knowledge distillation techniques have shown promise in balancing model performance with data privacy. However, achieving superior privacy during the training process of the target model is challenging due to the teacher model’s performance limitations and the scarcity of unlabeled benchmark data. To address this issue, we propose a novel framework called Distillation of Self-Adaptive Knowledge (DSAK). DSAK utilizes self-duplicated teacher and noise-generative models to introduce specialized self-adaptive noise for privacy training in the target model. By incorporating new data features derived from this noise, DSAK improves model performance and reduces the risk of memorizing member data. Experimental results demonstrate DSAK’s effectiveness in defending against existing attack schemes across multiple datasets while surpassing other membership inference defense schemes in terms of efficiency.
- Research Article
- 10.1109/tmi.2026.3681075
- Apr 6, 2026
- IEEE transactions on medical imaging
- Fan Li + 11 more
Mild cognitive impairment (MCI) is the prodromal stage of dementia involving complex interactions between the brain and peripheral organs. Emerging evidence indicates that heart dysfunction and gut microbiota dysbiosis can contribute to MCI pathogenesis. Yet, these discoveries of cross-organ interactions have not been applied to assist MCI diagnosis. In this work, we propose a novel diagnostic framework that exploits the interactions of brain, heart, and gut using whole-body PET images to guide MCI diagnosis for scenarios when only brain MRI, PET, or PET&MRI are available. Specifically, we collected a multi-cohort, multi-modal dataset comprising 1,545 whole-body PET images, 6,010 brain MR images, and 2,446 brain PET images from eight data centers. Organ-specific image encoders are first pretrained for the brain, heart, and gut individually. Then, to effectively align and integrate brain, heart, and gut features, we introduce positional prompts to act as anatomical-level attention to highlight disease-relevant spatial regions, and further develop hierarchical Transformers to model brain-heart, brain-gut, and brain-heart-gut interactions. Finally, to achieve MCI diagnosis using only brain images, we transfer the above brain-heart-gut model to a brain-only model via an introduced multi-level knowledge distillation scheme, including sample-level contrastive distillation, group-level distribution alignment, and response-level supervision. Extensive experiments on multi-center data demonstrate the superiority of our method over the state-of-the-art methods by resorting to effective integration of heart and gut interactions for MCI diagnosis.
- Research Article
- 10.1080/10095020.2026.2633014
- Apr 5, 2026
- Geo-spatial Information Science
- Hongjun Ma + 4 more
ABSTRACT Object detection demonstrates stronger performance when using multimodal remote-sensing images than when using single-modal data. However, in practical applications, some modalities may be unavailable in a specific area, which limits the application of multimodal object detection methods. To address this challenge, a cross-modal contrastive learning and knowledge distillation (CCLKD) method is proposed in this paper. CCLKD is composed of dual branches for both easy-to-detect and hard-to-detect modalities. When CCLKD is training, it employs the knowledge distillation strategy to enhance the performance of the hard-to-detect branch (student network) by transferring the knowledge from the easy-to-detect branch (teacher network). As a result, when the easy-to-detect modality is absent, CCLKD can obtain good detection performance using only hard-to-detect modal data. To more effectively enhance the representation ability of hard-to-detect modality, CCLKD introduces an adaptive temperature-based knowledge distillation (ATKD) strategy and a category-constrained contrastive learning (CCL) mechanism. ATKD dynamically adjusts the distillation temperature based on the predicted probability provided by the teacher network and provides logic-, feature-, and relationship-level distillations. CCL strengthens the similarity between instances of the same category while suppressing interference from different categories, thereby improving intraclass compactness and interclass separability in the feature space. We employed three standard object detection datasets and compared CCLKD with state-of-the-art methods to validate its performance. The experimental results demonstrate the effectiveness of each component within CCLKD and further validate its superiority over existing methods through both quantitative and visual comparisons.
- Research Article
- 10.63371/ic.v5.n1.a910
- Apr 4, 2026
- Ibero Ciencias - Revista Científica y Académica - ISSN 3072-7197
- Danny Fabian Cando Gordon + 1 more
This study developed an autonomous computational system for real-time facial emotion recognition, optimized for devices with limited computational resources. A custom convolutional neural network architecture was implemented integrating advanced compression techniques including dynamic quantization, structured pruning, and knowledge distillation. The system was evaluated with 45,000 images from the expanded FER-2013 dataset and validated with 30 participants under diverse real conditions. Experiments were conducted on three platforms: server with NVIDIA RTX 3080 GPU, standard laptop with Intel i5 processor, and Raspberry Pi 4. Results demonstrated an overall accuracy of 86.4% in classifying six basic emotions with an average latency of 45 milliseconds. A 73% reduction in model size was achieved (from 45MB to 12MB) and 60% in CPU usage without significant accuracy degradation. The system maintained 60 FPS on GPU, 30 FPS on standard CPU, and 15 FPS on Raspberry Pi 4. Robustness was validated under variable lighting and resolution conditions, maintaining accuracies above 81% even under challenging conditions. These findings confirm the feasibility of implementing sophisticated emotion recognition with local processing on everyday devices.
- Research Article
- 10.3390/s26072243
- Apr 4, 2026
- Sensors (Basel, Switzerland)
- Wenxin Li + 2 more
Cooperative multi-UAV pursuit-evasion under occlusions and sensor noise is challenged by intermittent observability of the evader, varying observation-window lengths, and non-stationary evader tactics, all of which destabilize prediction and undermine safety-constrained cooperation. To address these challenges, we propose a safe decision-making framework that uses behavior mode and subgoal inference as intermediate representations for interpretable, uncertainty-aware cooperation. Specifically, an observation-driven generative intent-subgoal model infers the evader's behavior mode and subgoal from short observation windows. Building on this model, a length-agnostic trajectory predictor is trained via multi-window knowledge distillation and consistency regularization to produce future trajectory predictions with calibrated uncertainty for arbitrary observation-window lengths, thereby reducing cross-window inference inconsistency and lowering online computational cost. Based on these predictions, we derive belief and risk features and develop a belief-risk-gated hierarchical multi-agent policy based on soft actor-critic with a safety projection layer, enabling adaptive strategy switching and a controllable trade-off between efficiency and safety. Experiments in obstacle-rich pursuit-evasion environments with randomized layouts and diverse obstacle configurations demonstrate more stable cooperative capture, safer maneuvering, and lower decision variance than representative baselines, indicating strong robustness and real-time feasibility. Specifically, across different observation-window settings, the proposed method improves the normalized expected return by approximately 5-7% over the strongest baseline and reduces pursuer losses by roughly 22-25%. Moreover, its end-to-end decision latency consistently remains within the 50 ms control cycle.
- Research Article
- 10.1038/s41598-026-46602-x
- Apr 3, 2026
- Scientific reports
- Hongyu Xing + 1 more
Industrial surface defect detection is critical for ensuring product quality and manufacturing efficiency across steel, electronics, and semiconductor sectors. However, practical deployment remains challenging due to the diversity of defect types, scale variations, and complex background noise. To address these issues, we propose YOLO-DCF (YOLO with Dual Distillation and Context-Aware Fusion), a novel and lightweight detection framework built upon YOLO11. The Context-Guided Dynamic Fusion FPN decomposes global context into orthogonal directional components, enabling precise localization of fine-grained defects while suppressing background noise, the C3k2-Dilated Multiscale Contextual Residual module leverages hierarchical receptive fields with parallel multi-dilation design to capture both local textures and global dependencies, and the Dual Block-Channel Knowledge Distillation module enhances model compression via a self-distillation mechanism by decoupling spatial and semantic knowledge flows, preserving essential representations during lightweight deployment. These modules enhance detection precision while maintaining real-time inference capability. Extensive experiments on NEU-DET and PKU-Market-PCB datasets validate the effectiveness of YOLO-DCF, which achieves mAP50 scores of 79.3% and 96.5%, respectively, representing significant improvements of 1.6% and 0.8% over baseline methods. Notably, YOLO-DCF demonstrates stronger recall and robustness in detecting fine-grained and low-contrast defects, while maintaining competitive real-time inference capability despite the increased model complexity. This work offers a practical and deployable solution for industrial quality inspection, and sets a new direction for efficient, distribution-aware visual recognition in manufacturing contexts.
- Research Article
- 10.1016/j.aej.2026.03.038
- Apr 1, 2026
- Alexandria Engineering Journal
- Maram Fahaad Almufareh
AMAKD: Adversarial multi-modal attention-based knowledge distillation for robust behaviour anomaly detection in real-world environments
- Research Article
- 10.1016/j.neunet.2025.108409
- Apr 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Hong Zhao + 4 more
Heterogeneous Feature Knowledge Distillation based on Enhanced Feature Projector Correlation.
- Research Article
- 10.1016/j.neunet.2025.108347
- Apr 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Jinke Liu + 2 more
Agm-Net: Attention-guided masking denoising anomaly location network.
- Research Article
- 10.1109/tpami.2025.3645279
- Apr 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Xiaoxia Zhang + 3 more
Graph Neural Networks (GNNs) have made significant strides in the analysis and modeling of complex network data, particularly excelling in graph and node classification tasks. However, the "closed box" nature of GNNs impedes user understanding and trust, thereby restricting their broader application. This challenge has spurred a growing focus on demystifying GNNs to make their decision-making processes more transparent. Traditional methods for explaining GNNs often rely on selecting subgraphs and employing combinatorial optimization to generate understandable outputs. However, these methods are closely linked to the inherent complexity of GNNs, leading to higher explanation costs. To address this issue, we introduce a lower-complexity proxy model to explain GNNs. Our approach leverages knowledge distillation with inter-layer alignment, specifically targeting the challenge of over-smoothing and its detrimental impact on model explanation. Initially, we distill critical insights from complex GNN models into a more manageable proxy model. We then apply an inter-layer alignment-based distillation technique to ensure alignment between the proxy and the original model, facilitating the extraction of node or edge-level explanations within the proxy framework. We theoretically prove that the explanations derived from the proxy model are faithful to both the proxy and the original model. Additionally, we show that the upper bound of unfaithfulness between the proxy and the original model remains consistent when the distillation error is infinitesimal. This inter-layer alignment knowledge distillation technique enables the proxy model to retain the knowledge learning and topological representation capabilities of the original model to the greatest extent. Experimental evaluations on numerous real-world datasets confirm the effectiveness of our method, demonstrating robust performance.
- Research Article
- 10.1016/j.patcog.2025.112631
- Apr 1, 2026
- Pattern Recognition
- Kang Ke + 4 more
Efficient vision-based occupancy prediction with knowledge distillation
- Research Article
- 10.1016/j.aej.2026.02.029
- Apr 1, 2026
- Alexandria Engineering Journal
- Heng Li + 3 more
New insights for enhancing the intelligence of coal mine: A two-stage method for unsupervised low-light image enhancement and lightweight detection
- Research Article
- 10.1016/j.media.2026.104089
- Apr 1, 2026
- Medical image analysis
- Bingchao Zhao + 7 more
FKDNuSeg: Flawless knowledge distillation for lightweight and fast nuclei instance segmentation and classification.
- Research Article
- 10.1016/j.displa.2025.103301
- Apr 1, 2026
- Displays
- Qifan Zhu + 5 more
Industrial Park Anomaly Detection: A virtual-real dataset and an attention-enhanced YOLO model via knowledge distillation