Highly Constrained Coded Aperture Imaging Systems Design Via a Knowledge Distillation Approach
Computational optical imaging (COI) systems have enabled the acquisition of high-dimensional signals through optical coding elements (OCEs). OCEs encode the high-dimensional signal in one or more snapshots, which are subsequently decoded using computational algorithms. Currently, COI systems are optimized through an end-to-end (E2E) approach, where the OCEs are modeled as a layer of a neural network and the remaining layers perform a specific imaging task. However, the performance of COI systems optimized through E2E is limited by the physical constraints imposed by these systems. This paper proposes a knowledge distillation (KD) framework for the design of highly physically constrained COI systems. This approach employs the KD methodology, which consists of a teacher-student relationship, where a high-performance, unconstrained COI system (the teacher), guides the optimization of a physically constrained system (the student) characterized by a limited number of snapshots. We validate the proposed approach, using a binary coded apertures single pixel camera for monochromatic and multispectral image reconstruction. Simulation results demonstrate the superiority of the KD scheme over traditional E2E optimization for the designing of highly physically constrained COI systems.
- Book Chapter
10
- 10.1016/b978-0-32-385787-1.00013-0
- Jan 1, 2022
- Deep Learning for Robot Perception and Cognition
Chapter 8 - Knowledge distillation
- Research Article
30
- 10.1016/j.artmed.2021.102176
- Sep 17, 2021
- Artificial Intelligence in Medicine
Classification of diabetic retinopathy using unlabeled data and knowledge distillation
- Conference Article
7
- 10.1109/icme51207.2021.9428395
- Jul 5, 2021
In recent years, knowledge distillation for semantic segmentation has been extensively studied in order to obtain satisfactory performance while reducing computational costs. Compared with natural images, segmentation targets in medical images have fuzzy boundaries that are difficult to determine, but current knowledge distillation methods fail to explicitly transfer boundary information, resulting in poor boundary discrimination in compact models. Therefore, in this paper, we propose two knowledge distillation modules, namely boundary-guided deep supervision and output space boundary embeddng alignment, to explicitly transfer boundary information. The validity of our knowledge distillation approaches is demonstrated by extensive experiemnts on four public medical image data sets, namely, Montgomery County, CHAOS, GlaS and DRISHTI-GS.
- Research Article
7
- 10.3390/fire6120446
- Nov 22, 2023
- Fire
This paper investigates the application of the YOLOv7 object detection model combined with knowledge distillation techniques in forest fire detection. As an advanced object detection model, YOLOv7 boasts efficient real-time detection capabilities. However, its performance may be constrained in resource-limited environments. To address this challenge, this research proposes a novel approach: considering that deep neural networks undergo multi-layer mapping from the input to the output space, we define the knowledge propagation between layers by evaluating the dot product of features extracted from two different layers. To this end, we utilize the Flow of Solution Procedure (FSP) matrix based on the Gram matrix and redesign the distillation loss using the Pearson correlation coefficient, presenting a new knowledge distillation method termed ILKDG (Intermediate Layer Knowledge Distillation with Gram Matrix-based Feature Flow). Compared with the classical knowledge distillation algorithm, KD, ILKDG achieved a significant performance improvement on a self-created forest fire detection dataset. Specifically, without altering the student network’s parameters or network layers, mAP@0.5 improved by 2.9%, and mAP@0.5:0.95 increased by 2.7%. These results indicate that the proposed ILKDG method effectively enhances the accuracy and performance of forest fire detection without introducing additional parameters. The ILKDG method, based on the Gram matrix and Pearson correlation coefficient, presents a novel knowledge distillation approach, providing a fresh avenue for future research. Researchers can further optimize and refine this method to achieve superior results in fire detection.
- Research Article
19
- 10.1016/j.jksuci.2023.101616
- Jun 14, 2023
- Journal of King Saud University - Computer and Information Sciences
Recently, deep neural networks (DNNs) have been used successfully in many fields, particularly, in medical diagnosis. However, deep learning (DL)modelsare expensive in terms of memory and computing resources, whichhinders their implementation in limited-resources devices or for delay-sensitive systems. Therefore, these deep models need to be accelerated and compressed to smaller sizes to be deployed on edge devices without noticeably affecting their performance. In this paper, recent accelerating and compression approaches of DNN are analyzed and compared regarding their performance, applications, benefits, and limitations with a more focus on the knowledge distillation approach as a successful emergent approach in this field. In addition, a framework is proposed to develop knowledge distilled DNN models that can be deployed on fog/edge devices for automatic disease diagnosis. To evaluate the proposed framework, two compressed medical diagnosis systems are proposed based on knowledge distillation deep neural models for both COVID-19 and Malaria. The experimental results show that these knowledge distilled models have been compressed by 18.4% and 15% of the original model and their responses accelerated by 6.14x and 5.86%, respectively, while there were no significant drop in their performance (dropped by 0.9% and 1.2%, respectively). Furthermore, the distilled models are compared with other pruned and quantized models. The obtained results revealed the superiority of the distilled models in terms of compression rates and response time.
- Research Article
19
- 10.1016/j.media.2023.102916
- Oct 1, 2023
- Medical Image Analysis
CReg-KD: Model refinement via confidence regularized knowledge distillation for brain imaging.
- Book Chapter
4
- 10.1007/978-3-030-84186-7_14
- Jan 1, 2021
The irrelevant information in documents poses a great challenge for machine reading comprehension (MRC). To deal with such a challenge, current MRC models generally fall into two separate parts: evidence extraction and answer prediction, where the former extracts the key evidence corresponding to the question, and the latter predicts the answer based on those sentences. However, such pipeline paradigms tend to accumulate errors, i.e. extracting the incorrect evidence results in predicting the wrong answer. In order to address this problem, we propose a Multi-Strategy Knowledge Distillation based Teacher-Student framework (MSKDTS) for machine reading comprehension. In our approach, we first take evidence and document respectively as the input reference information to build a teacher model and a student model. Then the multi-strategy knowledge distillation method transfers the knowledge from the teacher model to the student model at both feature and prediction level through knowledge distillation approach. Therefore, in the testing phase, the enhanced student model can predict answer similar to the teacher model without being aware of which sentence is the corresponding evidence in the document. Experimental results on the ReCO dataset demonstrate the effectiveness of our approach, and further ablation studies prove the effectiveness of both knowledge distillation strategies.
- Research Article
11
- 10.1038/s41598-024-59553-y
- Apr 20, 2024
- Scientific Reports
Multimodal spectral imaging offers a unique approach to the enhancement of the analytical capabilities of standalone spectroscopy techniques by combining information gathered from distinct sources. In this manuscript, we explore such opportunities by focusing on two well-known spectral imaging techniques, namely laser-induced breakdown spectroscopy, and hyperspectral imaging, and explore the opportunities of collaborative sensing for a case study involving mineral identification. In specific, the work builds upon two distinct approaches: a traditional sensor fusion, where we strive to increase the information gathered by including information from the two modalities; and a knowledge distillation approach, where the Laser Induced Breakdown spectroscopy is used as an autonomous supervisor for hyperspectral imaging. Our results show the potential of both approaches in enhancing the performance over a single modality sensing system, highlighting, in particular, the advantages of the knowledge distillation framework in maximizing the potential benefits of using multiple techniques to build more interpretable models and paving for industrial applications.
- Research Article
- 10.5753/jisa.2025.5154
- Oct 2, 2025
- Journal of Internet Services and Applications
Federated Learning (FL) is a distributed approach in which multiple devices collaborate to train a shared, global model (GM). During its training, client devices must frequently communicate their gradients to the central server to update the GM weights. This incurs significant communication costs (bandwidth utilization and the number of messages exchanged). The heterogeneous nature of clients’ local datasets poses an extra challenge to the model training. In this sense, we introduce FedSeleKDistill, Federated Selection and Knowledge Distillation Algorithm, to decrease the overall communication costs. FedSeleKDistill is an innovative combination of: (i) client selection, and (ii) knowledge distillation approaches with three main objectives: (i) reducing the number of devices training at every round; (ii) decreasing the number of rounds until convergence; and (iii) mitigating the effect of client’s heterogeneous data on the GM effectiveness. In this paper, we extend the results obtained from the initial paper presenting FedSeleKDistill. The additional experimental evaluations on the MNIST and German Traffic Signs Benchmark datasets demonstrate that FedSeleKDistill is highly efficient in training the GM until convergence in heterogeneous FL. FedSeleKDistill reaches a higher accuracy score and faster convergence than state-of-the-art models. Our results also show higher performance when analyzing the accuracy scores on the clients’ local datasets.
- Research Article
2
- 10.1016/j.cviu.2023.103853
- Oct 6, 2023
- Computer Vision and Image Understanding
ARCTIC: A knowledge distillation approach via attention-based relation matching and activation region constraint for RGB-to-Infrared videos action recognition
- Research Article
11
- 10.1051/shsconf/202213903002
- Jan 1, 2022
- SHS Web of Conferences
Despite recent advances in deep learning, the rise of edge devices, and the exponential growth of Internet of Things (IoT) connected devices undermine the performance of deep learning models. It is clear that the future of computing is moving to edge devices. Autonomous vehicles and self-driving cars have leveraged the power of computer vision, especially object detection to navigate through traffic safely. Nevertheless, to be able to drive on all types of roads, these new vehicles have to be equipped with a road anomaly detection system, which strikes the need for small deep learning models of detecting road anomalies that can be deployed on these vehicles for safer driving experience. However, the current deep learning models are not practical on embedded devices due to the heavy resource requirements of the models, as such cannot be deployed on embedded devices. This paper proposes a theoretical approach to building a lightweight model from a cumbersome pothole detection model that is suitable on edge devices using knowledge distillation. It presents the theoretical approach of knowledge distillation, why it is a better technique of model compression compared to the rest. It shows that a cumbersome model can be made lightweight without sacrificing accuracy and with a reduced time complexity and faster training time.
- Research Article
9
- 10.1109/jsen.2022.3181238
- Jul 15, 2022
- IEEE Sensors Journal
For deep learning model training, most existing supervised learning-based person re-identification (Re-ID) models require considerable data with annotations as samples. However, it is labor-intensive to generate labeled data in many real-world situations. Meanwhile, the large scale of the existing models increases the load of model learning. To this end, this paper proposes a person Re-ID method by deep batch active learning and knowledge distillation. With the goal of minimizing the human labeling cost and maximizing the performance of the person Re-ID model, a batch active learning algorithm is applied to person Re-ID, which selects samples with both uncertainty and diversity, and the model needs only a small amount of labeled data to achieve a high performance. In addition, a knowledge distillation approach is used to compress the original backbone model to reduce the model scale while maintaining the model performance. Furthermore, experiments on two public datasets demonstrate the effectiveness and superiority of the proposed method.
- Research Article
208
- 10.1109/tpami.2020.3001940
- Jun 12, 2020
- IEEE Transactions on Pattern Analysis and Machine Intelligence
In this work, we consider transferring the structure information from large networks to compact ones for dense prediction tasks in computer vision. Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance. Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem. Specifically, we study two structured distillation schemes: i) pair-wise distillation that distills the pair-wise similarities by building a static graph; and ii) holistic distillation that uses adversarial training to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection. Code is available at https://git.io/StructKD.
- Research Article
33
- 10.1016/j.compbiomed.2022.105581
- May 12, 2022
- Computers in Biology and Medicine
Knowledge distillation approach towards melanoma detection
- Research Article
17
- 10.1186/s40537-024-00928-3
- May 4, 2024
- Journal of Big Data
Large Language Models (LLMs) are characterized by their inherent memory inefficiency and compute-intensive nature, making them impractical to run on low-resource devices and hindering their applicability in edge AI contexts. To address this issue, Knowledge Distillation approaches have been adopted to transfer knowledge from a complex model, referred to as the teacher, to a more compact, computationally efficient one, known as the student. The aim is to retain the performance of the original model while substantially reducing computational requirements. However, traditional knowledge distillation methods may struggle to effectively transfer crucial explainable knowledge from an LLM teacher to the student, potentially leading to explanation inconsistencies and decreased performance. This paper presents DiXtill, a method based on a novel approach to distilling knowledge from LLMs into lightweight neural architectures. The main idea is to leverage local explanations provided by an eXplainable Artificial Intelligence (XAI) method to guide the cross-architecture distillation of a teacher LLM into a self-explainable student, specifically a bi-directional LSTM network.Experimental results show that our XAI-driven distillation method allows the teacher explanations to be effectively transferred to the student, resulting in better agreement compared to classical distillation methods,thus enhancing the student interpretability. Furthermore, it enables the student to achieve comparable performance to the teacher LLM while also delivering a significantly higher compression ratio and speedup compared to other techniques such as post-training quantization and pruning, which paves the way for more efficient and sustainable edge AI applications