Edge-Reserved Knowledge Distillation for Image Matting

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Deep learning-based methods have made significant progress in natural image matting. However, mainstream approaches like CNN and ViT mainly focus on capturing the global features of images, but they lack specialized treatment for edges. They struggle to accurately distinguish between foregrounds and backgrounds that have similar colors and textures, which results in blurred edge areas between the foreground and the background. To solve the above problems, we propose Edge-reserved Knowledge Distillation Model (ERKD), which can reserving good edge features while distill multimodal semantic information from large models. In order to acquire multi-scale edge features, we design the Edge-Reserved Module and the Multimodal Feature Fusion Module. At the same time, to enhance the capture of edge features, we introduce CLIP for feature-level knowledge distillation operations. The comprehensive evaluation on Composition-1k and Distinction-646 datasets shows that the performance of this method surpasses existing techniques.

Similar Papers
  • Research Article
  • Cite Count Icon 7
  • 10.1155/2022/8246234
The Influence Mechanism of Knowledge Network Allocation Mechanism on Knowledge Distillation of High-Tech Enterprises
  • Apr 25, 2022
  • Computational Intelligence and Neuroscience
  • Jianlin Yuan + 2 more

Increasing global development competition highlights the value of knowledge innovation ability of high-tech enterprises. In order to acquire innovative knowledge, the mediating variables of knowledge field activity and knowledge stock ranking are selected; the moderating variables of knowledge resource pooling and knowledge evolution are adopted to construct the conceptual model and theoretical analysis framework of the influence mechanism of knowledge network arrangement mechanism on knowledge distillation; the moderating mediating effect model is derived; and the influence mechanism of knowledge network allocation mechanism on knowledge distillation of high-tech enterprises is clarified. 531 valid questionnaires were obtained online and offline, and non-percentile bootstrap based on deviation correction was used to empirically investigate the influence mechanism and transmission path of knowledge network allocation mechanism on knowledge distillation of high-tech enterprises. The empirical results show that the main effect of knowledge network pairing on knowledge distillation of high-tech enterprises is significant. The results show that knowledge field activity and knowledge stock ranking play a differential intermediary role in knowledge network allocation and knowledge distillation, knowledge field activity plays a partial intermediary role in knowledge network allocation and knowledge distillation, and knowledge stock ranking plays a partial intermediary role in knowledge network allocation and knowledge distillation. Pooling knowledge resources positively moderates the positive effect of knowledge network allocation mechanism on knowledge distillation and significantly positively moderates the mediating effect of knowledge field activity, and there is a moderated mediating effect derived from it. However, there is no significant moderating effect on knowledge stock ranking between knowledge network allocation mechanism and knowledge distillation. Knowledge evolution positively moderates the positive effect of knowledge network allocation mechanism on knowledge distillation, significantly positively moderates the mediating effect of knowledge field activity, and derives the moderated mediating effect. However, there is no significant moderating effect on knowledge stock ranking between knowledge network allocation mechanism and knowledge distillation. This paper makes an empirical study on the effect of knowledge allocation mechanism on knowledge distillation, enriches the connotation and application scope of knowledge distillation, defines the driving factors and formation mechanism of knowledge distillation, and further promotes the knowledge value and knowledge appreciation of high-tech enterprises. It has guiding and reference significance in the acquisition of innovation knowledge and the promotion of competitiveness of high-tech enterprises.

  • Research Article
  • 10.1109/tpami.2025.3647862
Forget Me Not: Fighting Local Overfitting With Knowledge Fusion and Distillation.
  • Jan 1, 2025
  • IEEE transactions on pattern analysis and machine intelligence
  • Uri Stern + 2 more

Overfitting in deep neural networks occurs less frequently than expected. This is a puzzling observation, as theory predicts that greater model capacity should eventually lead to overfitting - yet this is rarely seen in practice. But what if overfitting does occur, not globally, but in specific sub-regions of the data space? In this work, we introduce a novel score that measures the forgetting rate of deep models on validation data, capturing what we term local overfitting: a performance degradation confined to certain regions of the input space. We demonstrate that local overfitting can arise even without conventional overfitting, and is closely linked to the double descent phenomenon. Building on these insights, we introduce a two-stage approach that leverages the training history of a single model to recover and retain forgotten knowledge: first, by aggregating checkpoints into an ensemble, and then by distilling it into a single model of the original size, thus enhancing performance without added inference cost. Extensive experiments across multiple datasets, modern architectures, and training regimes validate the effectiveness of our approach. Notably, in the presence of label noise, our method - Knowledge Fusion followed by Knowledge Distillation - outperforms both the original model and independently trained ensembles, achieving a rare win-win scenario: reduced training and inference complexity.

  • Research Article
  • Cite Count Icon 3204
  • 10.1007/s11263-021-01453-z
Knowledge Distillation: A Survey
  • Mar 22, 2021
  • International Journal of Computer Vision
  • Jianping Gou + 3 more

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

  • Conference Article
  • Cite Count Icon 39
  • 10.1109/mvip49855.2020.9116923
Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation
  • Feb 1, 2020
  • Sajjad Abbasi + 3 more

Knowledge distillation (KD) is a new method for transferring knowledge of a structure under training to another one. The conventional application of KD is in the form of learning a small model (named as a student) by soft labels produced by a complex model (named as a teacher). Due to the novel idea introduced in KD, recently, its notion is used in different methods such as compression and processes that are going to enhance the model accuracy. Although different techniques are proposed in the area of KD, there is a lack of a model to generalize KD techniques. In this paper, various studies in the scope of KD are investigated and analyzed to build a general model for KD. All the methods and techniques in KD can be summarized through the proposed model. By utilizing the proposed model, different methods in KD are better investigated and explored. The advantages and disadvantages of different approaches in KD can be better understood and developing a new strategy for KD can be possible. Using the proposed model, different KD methods are represented in an abstract view.

  • Research Article
  • 10.1109/jsen.2024.3517653
Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data.
  • Feb 1, 2025
  • IEEE sensors journal
  • Eun Som Jeon + 3 more

The analysis of wearable sensor data has enabled many successes in several applications. To represent the high-sampling rate time-series with sufficient detail, the use of topological data analysis (TDA) has been considered, and it is found that TDA can complement other time-series features. Nonetheless, due to the large time consumption and high computational resource requirements of extracting topological features through TDA, it is difficult to deploy topological knowledge in machine learning and various applications. In order to tackle this problem, knowledge distillation (KD) can be adopted, which is a technique facilitating model compression and transfer learning to generate a smaller model by transferring knowledge from a larger network. By leveraging multiple teachers in KD, both time-series and topological features can be transferred, and finally, a superior student using only time-series data is distilled. On the other hand, mixup has been popularly used as a robust data augmentation technique to enhance model performance during training. Mixup and KD employ similar learning strategies. In KD, the student model learns from the smoothed distribution generated by the teacher model, while mixup creates smoothed labels by blending two labels. Hence, this common smoothness serves as the connecting link that establishes a connection between these two methods. Even though it has been widely studied to understand the interplay between mixup and KD, most of them are focused on image based analysis only, and it still remains to be understood how mixup behaves in the context of KD for incorporating multimodal data, such as both time-series and topological knowledge using wearable sensor data. In this paper, we analyze the role of mixup in KD with time-series as well as topological persistence, employing multiple teachers. We present a comprehensive analysis of various methods in KD and mixup, supported by empirical results on wearable sensor data. We observe that applying mixup to training a student in KD improves performance. We suggest a general set of recommendations to obtain an enhanced student.

  • Research Article
  • Cite Count Icon 11
  • 10.1016/j.knosys.2023.110916
BookKD: A novel knowledge distillation for reducing distillation costs by decoupling knowledge generation and learning
  • Sep 1, 2023
  • Knowledge-Based Systems
  • Songling Zhu + 4 more

BookKD: A novel knowledge distillation for reducing distillation costs by decoupling knowledge generation and learning

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.neucom.2024.127516
Multi-perspective analysis on data augmentation in knowledge distillation
  • Mar 5, 2024
  • Neurocomputing
  • Wei Li + 3 more

Multi-perspective analysis on data augmentation in knowledge distillation

  • Research Article
  • Cite Count Icon 5
  • 10.3390/computers13080184
Knowledge Distillation in Image Classification: The Impact of Datasets
  • Jul 24, 2024
  • Computers
  • Ange Gabriel Belinga + 3 more

As the demand for efficient and lightweight models in image classification grows, knowledge distillation has emerged as a promising technique to transfer expertise from complex teacher models to simpler student models. However, the efficacy of knowledge distillation is intricately linked to the choice of datasets used during training. Datasets are pivotal in shaping a model’s learning process, influencing its ability to generalize and discriminate between diverse patterns. While considerable research has independently explored knowledge distillation and image classification, a comprehensive understanding of how different datasets impact knowledge distillation remains a critical gap. This study systematically investigates the impact of diverse datasets on knowledge distillation in image classification. By varying dataset characteristics such as size, domain specificity, and inherent biases, we aim to unravel the nuanced relationship between datasets and the efficacy of knowledge transfer. Our experiments employ a range of datasets to comprehensively explore their impact on the performance gains achieved through knowledge distillation. This study contributes valuable guidance for researchers and practitioners seeking to optimize image classification models through kno-featured applications. By elucidating the intricate interplay between dataset characteristics and knowledge distillation outcomes, our findings empower the community to make informed decisions when selecting datasets, ultimately advancing the field toward more robust and efficient model development.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/aiam48774.2019.00106
Obtain Dark Knowledge via Extended Knowledge Distillation
  • Oct 1, 2019
  • Chen Yuan + 1 more

Training a smaller student model on portable devices such as smart phones to mimic a complex and heavy teacher model through knowledge distillation has received extensive attention. Massive studies have been done on the application of knowledge distillation on various tasks as well as improvement of supervised information while training student model. However, few studies focus on the distillation of "dark knowledge" can be found in teacher model but hard to be expressed directly, which is very important because the training data used to train the teacher model are not always visible to the student model. We extended the method of knowledge distillation in this paper, not only taking the difference of logits between the teacher model and the student model as part of loss function as the basic knowledge distillation method did, but also paying attention to the interior of both models. We divided both teacher model and student model into several segments and made the outputs of these segments as close as possible to form another part of loss function, and this method was referred to as "Extended-KD" (Extended Knowledge Distillation). In our experiment, we used complete CIFAR-10 dataset to train student model as baseline, and then we tried to drop all examples of some labels to train student model through Extended-KD. Our experiment shows that Extended-KD method performs better than the basic knowledge distillation method; and knowledge distillation with incomplete datasets can also enable student model to predict the target labels it has never seen. Therefore, Extended-KD method can obtain dark knowledge properly.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/icaice54393.2021.00127
Classification of Histopathologic Images of Breast Cancer by Multi-teacher Small-sample Knowledge Distillation
  • Nov 1, 2021
  • Leiqi Wang + 1 more

Model fusion can effectively improve the effect of model prediction, but it will bring about an increase in time. In this paper, the dual-stage progressive knowledge distillation is improved in combination with multi-teacher knowledge distillation technology. A simple and effective multi-teacher's Softtarget integration method is proposed in multi-teacher network knowledge distillation. Improve the guiding role of excellent models in knowledge distillation. Dual-stage progressive knowledge distillation is a method for small sample knowledge distillation. A progressive network grafting method is used to realize knowledge distillation in a small sample environment. In the first step, the student blocks are grafted one by one onto the teacher network and intertwined with other teacher blocks for training, and the training process only updates the parameters of the grafted blocks. In the second step, the trained student blocks are grafted onto the teacher network in turn, so that the learned student blocks adapt to each other and finally replace the teacher network to obtain a lighter network structure. Using Softtarget acquired by this method in Dual-stage progressive knowledge distillation instead of Hardtarget training, excellent results were obtained on BreakHis data sets.

  • Research Article
  • Cite Count Icon 2
  • 10.1177/15501477211057037
Feature fusion-based collaborative learning for knowledge distillation
  • Nov 1, 2021
  • International Journal of Distributed Sensor Networks
  • Yiting Li + 4 more

Deep neural networks have achieved a great success in a variety of applications, such as self-driving cars and intelligent robotics. Meanwhile, knowledge distillation has received increasing attention as an effective model compression technique for training very efficient deep models. The performance of the student network obtained through knowledge distillation heavily depends on whether the transfer of the teacher’s knowledge can effectively guide the student training. However, most existing knowledge distillation schemes require a large teacher network pre-trained on large-scale data sets, which can increase the difficulty of knowledge distillation in different applications. In this article, we propose a feature fusion-based collaborative learning for knowledge distillation. Specifically, during knowledge distillation, it enables networks to learn from each other using the feature/response-based knowledge in different network layers. We concatenate the features learned by the teacher and the student networks to obtain a more representative feature map for knowledge transfer. In addition, we also introduce a network regularization method to further improve the model performance by providing a positive knowledge during training. Experiments and ablation studies on two widely used data sets demonstrate that the proposed method, feature fusion-based collaborative learning, significantly outperforms recent state-of-the-art knowledge distillation methods.

  • Research Article
  • Cite Count Icon 1
  • 10.1088/1742-6596/1982/1/012084
Adaptive Teacher Finetune: Towards high-performance knowledge distillation through adaptive fine-tuning
  • Jul 1, 2021
  • Journal of Physics: Conference Series
  • Zhenyan Hou + 1 more

Knowledge distillation is a widely used method to transfer knowledge from a large model to a small model. Traditional methods use pre-trained large models to supervise the training of small models, called Offline Knowledge Distillation, However, the structural gap between teachers and students limits its performance. After that, Oneline Knowledge Distillation retrained the teacher-student network from the beginning and the method of echo teaching greatly improved the performance. But there is very little work to explore the difference between the two. In this paper, we first point out that the essential difference between Offline and Oneline Knowledge Distillation is actually whether the weight of the teacher-student network has a process of mutual adaptation. If they adopt the teacher network and the student network jointly train to implement Offline Knowledge Distillation, there is no obvious difference in the final performance, no matter whether it is a joint distillation training. This shows that teacher-student network adaptation is important for Knowledge Distillation. Then, we propose an Adaptive Teacher Finetune (ATF) to adapt the teacher model to the student network. It will use student model information for Tinetune during the Offline Knowledge Distillation process. With normalized logical distribution and alpha-divergence, the performance improvement of ATF clearly exceeds the existing Offline and Oneline Knowledge Distillation method. Extensive experiments conducted on cifar and ImageNet support our aforementioned analysis and conclusions. With the newly introduced ATF, we obtained state-of-the-art performance on ResNet 18 on ImageNet.

  • Book Chapter
  • Cite Count Icon 9
  • 10.1007/978-3-031-26284-5_31
What Role Does Data Augmentation Play in Knowledge Distillation?
  • Jan 1, 2023
  • Wei Li + 5 more

Knowledge distillation is an effective way to transfer knowledge from a large model to a small model, which can significantly improve the performance of the small model. In recent years, some contrastive learning-based knowledge distillation methods (i.e., SSKD and HSAKD) have achieved excellent performance by utilizing data augmentation. However, the worth of data augmentation has always been overlooked by researchers in knowledge distillation, and no work analyzes its role in particular detail. To fix this gap, we analyze the effect of data augmentation on knowledge distillation from a multi-sided perspective. In particular, we demonstrate the following properties of data augmentation: (a) data augmentation can effectively help knowledge distillation work even if the teacher model does not have the information about augmented samples, and our proposed diverse and rich Joint Data Augmentation (JDA) is more valid than single rotating in knowledge distillation; (b) using diverse and rich augmented samples to assist the teacher model in training can improve its performance, but not the performance of the student model; (c) the student model can achieve excellent performance when the proportion of augmented samples is within a suitable range; (d) data augmentation enables knowledge distillation to work better in a few-shot scenario; (e) data augmentation is seamlessly compatible with some knowledge distillation methods and can potentially further improve their performance. Enlightened by the above analysis, we propose a method named Cosine Confidence Distillation (CCD) to transfer the augmented samples’ knowledge more reasonably. And CCD achieves better performance than the latest SOTA HSAKD with fewer storage requirements on CIFAR-100 and ImageNet-1k. Our code is released at https://github.com/liwei-group/CCD.

  • Research Article
  • Cite Count Icon 1
  • 10.1038/s41598-025-91152-3
Counterclockwise block-by-block knowledge distillation for neural network compression
  • Apr 3, 2025
  • Scientific Reports
  • Xiaowei Lan + 6 more

Model compression is a technique for transforming large neural network models into smaller ones. Knowledge distillation (KD) is a crucial model compression technique that involves transferring knowledge from a large teacher model to a lightweight student model. Existing knowledge distillation methods typically facilitate the knowledge transfer from teacher to student models in one or two stages. This paper introduces a novel approach called counterclockwise block-wise knowledge distillation (CBKD) to optimize the knowledge distillation process. The core idea of CBKD aims to mitigate the generation gap between teacher and student models, facilitating the transmission of intermediate-layer knowledge from the teacher model. It divides both teacher and student models into multiple sub-network blocks, and in each stage of knowledge distillation, only the knowledge from one teacher sub-block is transferred to the corresponding position of a student sub-block. Additionally, in the CBKD process, deeper teacher sub-network blocks are assigned higher compression rates. Extensive experiments on tiny-imagenet-200 and CIFAR-10 demonstrate that the proposed CBKD method can enhance the distillation performance of various mainstream knowledge distillation approaches.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3442555.3442581
Adversarial Metric Knowledge Distillation
  • Nov 27, 2020
  • Zihe Dong + 3 more

Knowledge distillation is dedicated to improving the performance of light weight networks by transferring knowledge during the training process. Meanwhile, it is important to apply knowledge distillation on different situations. The previous knowledge distillation method with adversarial samples uses a traditional knowledge distillation loss to let the student learn a good decision boundary. In this paper, we propose a novel method named Adversarial Metric Knowledge Distillation (AMKD), which utilizes adversarial samples to transfer the dark knowledge from the teacher to student. We select adversarial samples which are close to the decision boundary of two classes to metric the distance with negative class samples employing triplet loss constraint. The method guarantees the student network learning relationships among samples by quantitative metric learning. Therefore, we not only transfer information of the decision boundary but also ensure the student network can always maintain a proper distance from other negative classes. This can be another good exploration for knowledge distillation with adversarial samples. The experiments on CIFAR-10, CIFAR-100 and Tiny ImageNet datasets verify that the proposed knowledge distillation method works effectively on improving the student network performance.

Save Icon
Up Arrow
Open/Close