SpikeBERT: A language spikformer learned from BERT with knowledge distillation.
SpikeBERT: A language spikformer learned from BERT with knowledge distillation.
- Research Article
1
- 10.1109/tii.2025.3641291
- Jan 1, 2025
- IEEE Transactions on Industrial Informatics
Online action detection and anticipation aim to understand current or upcoming actions in video streams. In industry, current artificial neural network (ANN)-based methods suffer from prohibitive energy consumption, fundamentally limiting their deployment on resource-constrained industrial devices. To bridge the gap between theory and application practice of informatics in industrial environments, we propose a novel knowledge distillation-based spiking neural network (KDSNN), which synergistically integrates bioinspired spike-driven processing with knowledge distillation, significantly reducing the energy consumption. Specifically, KDSNN includes a pioneering spiking neural network (SNN) architecture for online action detection and anticipation, which combines well-designed hierarchical spike convolutional neural network (CNN) block and spike Transformer block to capture spike-driven information. To further improve the performance of our SNN while maintaining low energy consumption, we introduce the knowledge distillation paradigm, which aims to utilize an expert-level ANN as a teacher to guide our SNN. Based on this, we propose a novel distillation loss, which consists of feature distillation and logit distillation. Notably, to address the cross-domain feature alignment in feature distillation, the optimal transport theory is employed to realize cross-domain knowledge transfer for the first time by minimizing the Wasserstein distance between continuous features (ANNs) and discrete features (SNNs). Through extensive evaluations on THUMOS14 and EPIC-Kitchen-100 datasets, the energy consumption of our KDSNN is only 27.1% and 10.0% of the state-of-the-art ANN-based method MAT. Equally importantly, the parameter count of our KDSNN is only 37.0% and 27.7% of MAT on THUMOS14 and EPIC-Kitchen-100, respectively.
- Conference Article
26
- 10.1109/icpr48806.2021.9412147
- Jan 10, 2021
Spiking Neural Networks (SNN) are energy-efficient computing architectures that exchange spikes for processing information, unlike classical Artificial Neural Networks (ANN). Due to this, SNNs are better suited for real-life deployments. However, similar to ANNs, SNNs also benefit from deeper architectures to obtain improved performance. Furthermore, like the deep ANNs, the memory, compute and power requirements of SNNs also increase with model size, and model compression becomes a necessity. Knowledge distillation is a model compression technique that enables transferring the learning of a large machine learning model to a smaller model with minimal loss in performance. In this paper, we propose techniques for knowledge distillation in spiking neural networks for the task of image classification. We present ways to distill spikes from a larger SNN, also called the teacher network, to a smaller one, also called the student network, while minimally impacting the classification accuracy. We demonstrate the effectiveness of the proposed method with detailed experiments on three standard datasets while proposing novel distillation methodologies and loss functions. We also present a multi-stage knowledge distillation technique for SNNs using an intermediate network to obtain higher performance from the student network. Our approach is expected to open up new avenues for deploying high performing large SNN models on resource-constrained hardware platforms.
- Conference Article
3
- 10.1109/isqed57927.2023.10129306
- Apr 5, 2023
Building accurate and efficient deep neural network (DNN) models for intelligent sensing systems to process data locally is essential. Spiking neural networks (SNNs) have gained significant popularity in recent years because they are more biological-plausible and energy-efficient than DNNs. However, SNNs usually have lower accuracy than DNNs. In this paper, we propose to use SNNs for image sensing applications. Moreover, we introduce the DNN-SNN knowledge distillation algorithm to reduce the accuracy gap between DNNs and SNNs. Our DNNSNN knowledge distillation improves the accuracy of an SNN by transferring knowledge between a DNN and an SNN. To better transfer the knowledge, our algorithm creates two learning paths from a DNN to an SNN. One path is between the output layer and another path is between the intermediate layer. DNNs use real numbers to propagate information between neurons while SNNs use 1-bit spikes. To empower the communication between DNNs and SNNs, we utilize a decoder to decode spikes into real numbers. Also, our algorithm creates a learning path from an SNN to a DNN. This learning path better adapts the DNN to the SNN by allowing the DNN to learn the knowledge from the SNN. Our SNN models are deployed on Loihi, which is a specialized chip for SNN models. On the MNIST dataset, our SNN models trained by the DNN-SNN knowledge distillation achieve better accuracy than the SNN models on GPU trained by other training algorithms with much lower energy consumption per image.
- Research Article
17
- 10.1016/j.neunet.2024.106475
- Jun 19, 2024
- Neural Networks
Self-architectural knowledge distillation for spiking neural networks
- Conference Article
2
- 10.1109/cine56307.2022.10037455
- Dec 1, 2022
Spiking Neural Networks (SNNs) can significantly enhance energy efficiency on neuromorphic hardware due their sparse, biological plausibility and binary event (or spike) driven processing. However, from the non-differentiable nature of a spiking neuron, training high-accuracy and low-latency SNNs is challenging. Recent researches continue to look for ways to improve accuracy and latency. To address these issues in SNNs, we propose a technique that concatenates Knowledge Distillation (KD) and Batch Normalization Through Time (BNTT) method in this study. The BNTT boosts low-latency and low-energy training in SNNs by allowing a neuron to handle the spike rate through various timesteps. The KD approach effectively transfers hidden information from the teacher model to the student network, which converts artificial neural network parameters to SNN weights. This concept allows enriching the performance of SNNs better than the prior technique. Experiments are carried out on the Tiny-ImageNet, CIFAR-10, and CIFAR-100 datasets. on various VGG architectures. We reach top-1 accuracy of 55.67% for ImageNet on VGG-11 and 73.11% for the CIFAR-100 dataset on VGG-16. These results demonstrate that our proposal outperforms earlier converted SNNs in accuracy with only 5 timesteps.
- Conference Article
1
- 10.1109/ijcnn60899.2024.10650960
- Jun 30, 2024
Spiking Neural Network (SNN) is a kind of braininspired and event-driven network, which is becoming a promising energy-efficient alternative to Artificial Neural Networks (ANNs). However, the performance of SNNs by direct training is far from satisfactory. Inspired by the idea of Teacher–Student Learning, in this paper, we study a novel learning method named SuperSNN, which utilizes the ANN model to guide the SNN model learning. SuperSNN leverages knowledge distillation to learn comprehensive supervisory information from pre-trained ANN models, rather than solely from labeled data. Unlike previous work that naively matches SNN and ANN’s features without deeply considering the precision mismatch, we propose an indirect relation-based approach, which defines a pairwise-relational loss function and unifies the value scale of ANN and SNN representation vectors, to alleviate the unexpected precision loss. This allows the knowledge of teacher ANNs can be effectively utilized to train student SNNs. The experimental results on three image datasets demonstrate that no matter whether homogeneous or heterogeneous teacher ANNs are used, our proposed SuperSNN can significantly improve the learning of student SNNs with only two time steps.
- Research Article
7
- 10.3390/biomimetics8040375
- Aug 18, 2023
- Biomimetics
Spiking neural networks (SNNs) are widely recognized for their biomimetic and efficient computing features. They utilize spikes to encode and transmit information. Despite the many advantages of SNNs, they suffer from the problems of low accuracy and large inference latency, which are, respectively, caused by the direct training and conversion from artificial neural network (ANN) training methods. Aiming to address these limitations, we propose a novel training pipeline (called IDSNN) based on parameter initialization and knowledge distillation, using ANN as a parameter source and teacher. IDSNN maximizes the knowledge extracted from ANNs and achieves competitive top-1 accuracy for CIFAR10 (94.22%) and CIFAR100 (75.41%) with low latency. More importantly, it can achieve 14× faster convergence speed than directly training SNNs under limited training resources, which demonstrates its practical value in applications.
- Research Article
1
- 10.1609/aaai.v39i16.33895
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
Spiking Neural Networks (SNNs) are promising for low-power computation due to their event-driven mechanism but often suffer from lower accuracy compared to Artificial Neural Networks (ANNs). ANN-to-SNN knowledge distillation can improve SNN performance, but previous methods either focus solely on label information, missing valuable intermediate layer features, or use a layer-wise approach that neglects spatial and temporal semantic inconsistencies, leading to performance degradation. To address these limitations, we propose a novel method called self-attentive spatio-temporal calibration (SASTC). SASTC uses self-attention to identify semantically aligned layer pairs between ANN and SNN, both spatially and temporally. This enables the autonomous transfer of relevant semantic information. Extensive experiments show that SASTC outperforms existing methods, effectively solving the mismatching problem. Superior accuracy results include 95.12% on CIFAR-10, 79.40% on CIFAR-100 with 2 time steps, and 68.69% on ImageNet with 4 time steps for static datasets, and 97.92% on DVS-Gesture and 83.60% on DVS-CIFAR10 for neuromorphic datasets. This marks the first time SNNs have outperformed ANNs on both CIFAR-10 and CIFAR-100, shedding the new light on the potential applications of SNNs.
- Research Article
48
- 10.1038/s41467-024-51110-5
- Aug 9, 2024
- Nature Communications
Communication by rare, binary spikes is a key factor for the energy efficiency of biological brains. However, it is harder to train biologically-inspired spiking neural networks than artificial neural networks. This is puzzling given that theoretical results provide exact mapping algorithms from artificial to spiking neural networks with time-to-first-spike coding. In this paper we analyze in theory and simulation the learning dynamics of time-to-first-spike-networks and identify a specific instance of the vanishing-or-exploding gradient problem. While two choices of spiking neural network mappings solve this problem at initialization, only the one with a constant slope of the neuron membrane potential at threshold guarantees the equivalence of the training trajectory between spiking and artificial neural networks with rectified linear units. For specific image classification architectures comprising feed-forward dense or convolutional layers, we demonstrate that deep spiking neural network models can be effectively trained from scratch on MNIST and Fashion-MNIST datasets, or fine-tuned on large-scale datasets, such as CIFAR10, CIFAR100 and PLACES365, to achieve the exact same performance as that of artificial neural networks, surpassing previous spiking neural networks. Our approach accomplishes high-performance classification with less than 0.3 spikes per neuron, lending itself for an energy-efficient implementation. We also show that fine-tuning spiking neural networks with our robust gradient descent algorithm enables their optimization for hardware implementations with low latency and resilience to noise and quantization.
- Conference Article
21
- 10.1109/icassp40776.2020.9053914
- May 1, 2020
Spiking Neural Networks (SNNs), widely known as the third generation of neural networks, encode input information temporally using sparse spiking events, which can be harnessed to achieve higher computational efficiency for cognitive tasks. However, considering the rapid strides in accuracy enabled by state-of-the-art Analog Neural Networks (ANNs), SNN training algorithms are much less mature, leading to accuracy gap between SNNs and ANNs. In this paper, we propose different SNN training methodologies, varying in degrees of biofidelity, and evaluate their efficacy on complex image recognition datasets. First, we present biologically plausible Spike Timing Dependent Plasticity (STDP) based deterministic and stochastic algorithms for unsupervised representation learning in SNNs. Our analysis on the CIFAR-10 dataset indicates that STDP-based learning rules enable the convolutional layers to self-learn low-level input features using fewer training examples. However, STDP-based learning is limited in applicability to shallow SNNs (≤4 layers) while yielding considerably lower than state-of-the-art accuracy. In order to scale the SNNs deeper and improve the accuracy further, we propose conversion methodology to map off-the-shelf trained ANN to SNN for energy-efficient inference. We demonstrate 69.96% accuracy for VGG16-SNN on ImageNet. However, ANN-to-SNN conversion leads to high inference latency for achieving the best accuracy. In order to minimize the inference latency, we propose spike-based error backpropagation algorithm using differentiable approximation for the spiking neuron. Our preliminary experiments on CIFAR-10 show that spike-based error backpropagation effectively captures temporal statistics to reduce the inference latency by up to 8× compared to converted SNNs while yielding comparable accuracy
- Conference Article
27
- 10.1109/coolchips52128.2021.9410323
- Apr 14, 2021
Spiking neural networks (SNNs) that enable greater computational efficiency on neuromorphic hardware have attracted attention. Existing ANN-SNN conversion methods can effectively convert the weights to SNNs from a pre-trained ANN model. However, the state-of-the-art ANN-SNN conversion methods suffer from accuracy loss and high inference latency due to ineffective conversion methods. To solve this problem, we train low-latency SNN through knowledge distillation with Kullback-Leibler divergence (KL divergence). We achieve superior accuracy on CIFAR-100, 74.42% for VGG16 architecture with 5 timesteps. To our best knowledge, our work performs the fastest inference without accuracy loss compared to other state-of-the-art SNN models.
- Research Article
9
- 10.14704/web/v19i1/web19001
- Dec 24, 2021
- Webology
The goal of this paper is to use artificial intelligence to build and evaluate an adaptive learning system where we adopt the basic approaches of spiking neural networks as well as artificial neural networks. Spiking neural networks receive increasing attention due to their advantages over traditional artificial neural networks. They have proven to be energy efficient, biological plausible, and up to 105 times faster if they are simulated on analogue traditional learning systems. Artificial neural network libraries use computational graphs as a pervasive representation, however, spiking models remain heterogeneous and difficult to train. Using the artificial intelligence deductive method, the paper posits two hypotheses that examines whether 1) there exists a common representation for both neural networks paradigms for tutorial mentoring, and whether 2) spiking and non-spiking models can learn a simple recognition task for learning activities for adaptive learning. The first hypothesis is confirmed by specifying and implementing a domain-specific language that generates semantically similar spiking and non-spiking neural networks for tutorial mentoring. Through three classification experiments, the second hypothesis is shown to hold for non-spiking models, but cannot be proven for the spiking models. The paper contributes three findings: 1) a domain-specific language for modelling neural network topologies in adaptive tutorial mentoring for students, 2) a preliminary model for generalizable learning through back-propagation in spiking neural networks for learning activities for students also represented in results section, and 3) a method for transferring optimised non-spiking parameters to spiking neural networks has also been developed for adaptive learning system. The latter contribution is promising because the vast machine learning literature can spill-over to the emerging field of spiking neural networks and adaptive learning computing. Future work includes improving the back-propagation model, exploring time-dependent models for learning, and adding support for adaptive learning systems.
- Research Article
- 10.1088/2634-4386/ade821
- Jul 3, 2025
- Neuromorphic Computing and Engineering
Spiking neural networks (SNNs) are renowned for their energy efficiency and bio-fidelity, but their widespread adoption is hindered by challenges in training, primarily due to the non-differentiability of spiking activations and limited representational capacity. Existing approaches, such as artificial neural network (ANN)-to-SNN conversion and surrogate gradient learning, either suffer from prolonged simulation times or suboptimal performance. To address these challenges, we provide a novel perspective that frames knowledge distillation as a hybrid training strategy, effectively combining knowledge transfer from pretrained models with spike-based gradient learning. This approach leverages the complementary benefits of both paradigms, enabling the development of high-performance, low-latency SNNs. Our approach features a lightweight affine projector that facilitates flexible representation alignment across diverse network architectures and neuron types. We further empirically demonstrate that the effectiveness of distillation is robust, irrespective of whether high-precision membrane potentials or binary spike trains are used as features. Through a quantitative measure of the consistency between model predictions and the saliency of relevant input pixels, we show that knowledge transfer is grounded in a shared understanding of salient features, rather than the exact replication of numerical activations. This framework represents a significant step towards enabling SNNs to achieve accuracy levels that are competitive with those of their ANN counterparts, while maintaining a minimal number of timesteps. For instance, applying our method to ResNet-18 on CIFAR-100 attains 80.48% accuracy with just four timesteps, surpassing the equivalent ANN (79.90%) and yielding a 3.49% improvement over non-distilled SNNs.
- Research Article
- 10.1609/aaai.v40i34.40085
- Mar 14, 2026
- Proceedings of the AAAI Conference on Artificial Intelligence
Knowledge distillation from Artificial Neural Networks (ANNs) to Spiking Neural Networks (SNNs) is a prominent training paradigm. However, its efficacy is fundamentally limited by a spectral mismatch: SNNs, with their intrinsic low-pass filtering characteristics, struggle to learn high-frequency details from their ANN teachers, creating a bottleneck in knowledge transfer at both the feature and logit levels. To address this, we propose Bi-Spectrum Distillation (BSD), a novel framework that mitigates the mismatch from two complementary perspectives. First, at the feature level, our Spectral Residual Distillation (SRD) enhances the student SNN's features with a parameter-efficient, learnable filter that adaptively compensates for high-frequency information loss, which transforms the student's output to better match the teacher's rich spectral target. Second, at the logits level, our Spectral Semantic Distillation (SSD) enhances fine-grained classification by distilling high-frequency components from teacher-ordered logits. Extensive experiments on CIFAR-10/100, ImageNet, and CIFAR10-DVS demonstrate that BSD achieves new state-of-the-art performance across both CNN and Transformer-based SNNs, validating its effectiveness and broad applicability.
- Research Article
2
- 10.1016/j.neunet.2025.107478
- Aug 1, 2025
- Neural networks : the official journal of the International Neural Network Society
S4-KD: A single step spiking SiamFC+ + for object tracking with knowledge distillation.