ATW-SNN: low-accuracy-loss asymmetric ternary-weight spiking neural network with spike calibration strategy
Abstract The deployment of spiking neural networks (SNNs) on resource-constrained devices is mainly limited by memory footprint and energy consumption. Specifically, these limitations stem from the overhead incurred by full-precision storage and floating-point operations. To address these bottlenecks, we propose the Asymmetric Ternary Weight SNN (ATW-SNN): by quantizing weights into asymmetric ternary values, ATW-SNN effectively reduces memory requirements and the number of floating-point operations. To mitigate the accuracy loss induced by quantization, we introduce a spike calibration strategy (SCS). This strategy leverages cosine distance to evaluate the discrepancy in spike distribution between ATW-SNN and the full-precision SNN, and conducts knowledge distillation based on this discrepancy. This process is designed to align the output of ATW-SNN with that of the fullprecision SNN, thereby maintaining comparable per-formance. Moreover, we design an inference process for ATW-SNN on edge devices that changes the computing paradigm to utilize sparse weights and integer accumulation for enhanced energy efficiency. We conducted experiments on CIFAR-10, CIFAR-100, and DVS-Gesture, and the results show that the accuracy loss is less than 0.9%, while the weight memory is compressed by at least 14.91× and energy consumption is reduced by at least 4.15×. This work provides a feasible solution for the energy-efficient deployment of SNNs on edge devices.
- Research Article
1
- 10.3389/fnins.2024.1440000
- Sep 4, 2024
- Frontiers in neuroscience
Spiking neural networks (SNNs) have received increasing attention due to their high biological plausibility and energy efficiency. The binary spike-based information propagation enables efficient sparse computation in event-based and static computer vision applications. However, the weight precision and especially the membrane potential precision remain as high-precision values (e.g., 32 bits) in state-of-the-art SNN algorithms. Each neuron in an SNN stores the membrane potential over time and typically updates its value in every time step. Such frequent read/write operations of high-precision membrane potential incur storage and memory access overhead in SNNs, which undermines the SNNs' compatibility with resource-constrained hardware. To resolve this inefficiency, prior works have explored the time step reduction and low-precision representation of membrane potential at a limited scale and reported significant accuracy drops. Furthermore, while recent advances in on-device AI present pruning and quantization optimization with different architectures and datasets, simultaneous pruning with quantization is highly under-explored in SNNs. In this work, we present SpQuant-SNN, a fully-quantized spiking neural network with ultra-low precision weights, membrane potential, and high spatial-channel sparsity, enabling the end-to-end low precision with significantly reduced operations on SNN. First, we propose an integer-only quantization scheme for the membrane potential with a stacked surrogate gradient function, a simple-yet-effective method that enables the smooth learning process of quantized SNN training. Second, we implement spatial-channel pruning with membrane potential prior, toward reducing the layer-wise computational complexity, and floating-point operations (FLOPs) in SNNs. Finally, to further improve the accuracy of low-precision and sparse SNN, we propose a self-adaptive learnable potential threshold for SNN training. Equipped with high biological adaptiveness, minimal computations, and memory utilization, SpQuant-SNN achieves state-of-the-art performance across multiple SNN models for both event-based and static image datasets, including both image classification and object detection tasks. The proposed SpQuant-SNN achieved up to 13× memory reduction and >4.7× FLOPs reduction with < 1.8% accuracy degradation for both classification and object detection tasks, compared to the SOTA baseline.
- Research Article
- 10.71465/csb169
- Dec 28, 2025
- Computer Science Bulletin
Spiking Neural Networks (SNNs) have emerged as a promising paradigm for energy-efficient machine learning, leveraging event-driven computation and sparse data processing to mimic biological neural mechanisms. However, deploying high-performance SNNs on resource-constrained edge neuromorphic hardware remains a significant challenge due to the high memory footprint and computational costs associated with full-precision weights and membrane potentials. While quantization is a well-established technique for Artificial Neural Networks (ANNs), its direct application to SNNs is complicated by the non-differentiable nature of spiking functions and the temporal dynamics of neuronal states. This paper presents a comprehensive framework for Efficient Spiking Neural Networks via Quantization-Aware Training (QAT). We propose a novel differentiable quantization scheme that integrates learnable step-size parameters directly into the surrogate gradient learning loop, allowing the network to adapt its dynamic range during training. Furthermore, we introduce a bit-width adaptive Leaky Integrate-and-Fire (LIF) neuron model that mitigates the information loss typically observed in low-precision spiking regimes. Our approach is validated on standard static and neuromorphic datasets, demonstrating that 4-bit quantized SNNs can achieve accuracy comparable to full-precision counterparts while reducing memory usage by 75% and energy consumption by over 60% on edge devices.
- Research Article
13
- 10.1109/tbcas.2023.3279367
- Jun 1, 2023
- IEEE Transactions on Biomedical Circuits and Systems
Implementing neural networks (NN) on edge devices enables AI to be applied in many daily scenarios. The stringent area and power budget on edge devices impose challenges on conventional NNs with massive energy-consuming Multiply Accu- mulation (MAC) operations and offer an opportunity for Spiking Neural Networks (SNN), which can be implemented within sub- mW power budget. However, mainstream SNN topologies varies from Spiking Feedforward Neural Network (SFNN), Spiking Recurrent Neural Network (SRNN), to Spiking Convolutional Neural Network (SCNN), and it is challenging for the edge SNN processor to adapt to different topologies. Besides, online learning ability is critical for edge devices to adapt to local environments but comes with dedicated learning modules, further increasing area and power consumption burdens. To alleviate these prob- lems, this work proposed RAINE, a reconfigurable neuromorphic engine supporting multiple SNN topologies and a dedicated trace- based rewarded spike-timing-dependent plasticity (TR-STDP) learning algorithm. Sixteen Unified-Dynamics Learning-Engines (UDLEs) are implemented in RAINE to realize a compact and reconfigurable implementation of different SNN operations. Three topology-aware data reuse strategies are proposed and analyzed to optimize the mapping of different SNNs on RAINE. A 40-nm prototype chip is fabricated, achieving energy-per- synaptic-operation (SOP) of 6.2 pJ/SOP at 0.51V, and power consumption of 510 μW at 0.45V. Finally, three examples with dif- ferent SNN topologies, including SRNN-based ECG arrhythmia detection, SCNN-based 2D image classification, and end-to-end on-chip learning for MNIST digit recognition, are demonstrated on RAINE with ultra-low energy consumption of 97.7nJ/step, 6.28μJ/sample, and 42.98μJ/sample respectively. These results show the feasibility of obtaining high reconfigurability and low power consumption simultaneously on a SNN processor.
- Research Article
6
- 10.1109/tnnls.2024.3352653
- Feb 1, 2025
- IEEE transactions on neural networks and learning systems
With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this article, we focus on the task where the agent needs to learn multidimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multilayer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based reinforcement learning (RL) methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network (SAN) without any floating-point matrix operations, we draw inspiration from the nonspiking interneurons found in insects and employ the membrane voltage of the nonspiking neurons to represent the action. Before the nonspiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intralayer connections are used in output populations to enhance the representation capacity. This mechanism exists extensively in animals and has been demonstrated effectively. Finally, we propose a fully SAN with intralayer connections (ILC-SAN). Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art performance on continuous control tasks from OpenAI gym. Moreover, we estimate the theoretical energy consumption when deploying ILC-SAN on neuromorphic chips to illustrate its high energy efficiency.
- Research Article
- 10.62311/nesx/rphcr2
- May 26, 2025
- International Journal of Academic and Industrial Research Innovations(IJAIRI)
Abstract: Spiking Neural Networks (SNNs) represent a promising frontier in neuromorphic computing, enabling low-power, high-efficiency computations for edge intelligence. This paper investigates the application of SNNs in real-time edge scenarios such as sensor fusion, event-based vision, and speech recognition. By comparing SNNs to conventional deep neural networks (DNNs), we demonstrate significant reductions in energy consumption and latency while maintaining competitive accuracy. We employ simulation frameworks like PyTorch and TensorFlow, model interpretability tools like SHAP and LIME, and perform regression and predictive analyses to assess performance. Results indicate that SNN-based models achieve up to 65% lower energy consumption compared to DNNs on edge devices while delivering acceptable performance trade-offs. Keywords: Spiking Neural Networks, SNNs, Edge Intelligence, Neuromorphic Computing, Energy Efficiency, TensorFlow, PyTorch, SHAP, LIME
- Research Article
12
- 10.3390/s23146548
- Jul 20, 2023
- Sensors
Spiking neural networks (SNNs) have attracted considerable attention as third-generation artificial neural networks, known for their powerful, intelligent features and energy-efficiency advantages. These characteristics render them ideally suited for edge computing scenarios. Nevertheless, the current mapping schemes for deploying SNNs onto neuromorphic hardware face limitations such as extended execution times, low throughput, and insufficient consideration of energy consumption and connectivity, which undermine their suitability for edge computing applications. To address these challenges, we introduce EdgeMap, an optimized mapping toolchain specifically designed for deploying SNNs onto edge devices without compromising performance. EdgeMap consists of two main stages. The first stage involves partitioning the SNN graph into small neuron clusters based on the streaming graph partition algorithm, with the sizes of neuron clusters limited by the physical neuron cores. In the subsequent mapping stage, we adopt a multi-objective optimization algorithm specifically geared towards mitigating energy costs and communication costs for efficient deployment. EdgeMap-evaluated across four typical SNN applications-substantially outperforms other state-of-the-art mapping schemes. The performance improvements include a reduction in average latency by up to 19.8%, energy consumption by 57%, and communication cost by 58%. Moreover, EdgeMap exhibits an impressive enhancement in execution time by a factor of 1225.44×, alongside a throughput increase of up to 4.02×. These results highlight EdgeMap's efficiency and effectiveness, emphasizing its utility for deploying SNN applications in edge computing scenarios.
- Conference Article
4
- 10.23919/irs54158.2022.9904979
- Sep 12, 2022
Radar-based hand gesture recognition is a promising alternative to the camera-based solutions since radar is not impacted by lighting conditions and has no privacy concerns. Energy consumption is a key concern for radar applications on edge devices. Thus, a time-domain-based training approach that avoids the computationally expensive pre-processing fast Fourier transform (FFT) steps and utilizes time-domain radar data has been used. Spiking neural networks (SNNs) are recognized as being lower-power and more energy-efficient than artificial neural networks (ANNs). Therefore, we used the time-domain training approach alongside SNNs to conserve the most energy. This work evaluates several convolutional-based SNNs and their ANN variants to determine the SNNs appropriateness for temporally based datasets and their ability to learn complex spatio-temporal features. All models were trained using only time-domain data and then used to classify ten different gestures recorded by five different people using a 60 GHz frequency-modulated continuous-wave (FMCW) radar sensor. The results indicate the effectiveness of the used time-domain training approach and the ability of SNNs to outperform their ANN counterparts.
- Conference Article
10
- 10.1109/conit55038.2022.9847977
- Jun 24, 2022
Deep Neural Networks are known for their applications in the domains like computer vision, natural language processing, speech recognition, pattern recognition etc. Though these models are incredibly powerful they consume a considerable amount of memory bandwidth, storage and other computational resources. These heavy models can be successfully executed on machines with CPU/GPU/TPU support. It becomes difficult for the embedded devices to execute them as they are computationally constrained. In order to ease the deployment of these models onto the embedded devices we need to optimize them. Optimization of the model refers to the decrease in model size without compromising with the performance such as model accuracy, number of flops, and model parameters. We present a hybrid optimisation method to address this problem. Hybrid optimization is a 2-phase technique, pruning followed by quantization. Pruning is the process of eliminating inessential weights and connections in order to reduce the model size. Once the unnecessary parameters are removed, the weights of the remaining parameters are converted into 8-bit integer value and is termed quantization of the model. We verify and validate the performance of this hybrid optimization technique for image classification task on the CIFAR-10 dataset. We performed hybrid optimization process for 3 heavy weight models in this work namely ResNet56, ResNet110 and GoogleNet. On an average, the difference in number of flops and parameters is 40%. The reduction in number of parameters and flops has negligible effect on model performance and the variation in accuracy is less than 2%. Further, the optimized model is deployed on edge devices and embedded platform, NVIDIA Jetson TX2 Module.
- Conference Article
86
- 10.1109/cvpr.2019.00044
- Jun 1, 2019
Recently, there has been a lot of interest in building compact models for video classification which have a small memory footprint (<1 GB). While these models are compact, they typically operate by repeated application of a small weight matrix to all the frames in a video. For example, recurrent neural network based methods compute a hidden state for every frame of the video using a recurrent weight matrix. Similarly, cluster-and-aggregate based methods such as NetVLAD have a learnable clustering matrix which is used to assign soft-clusters to every frame in the video. Since these models look at every frame in the video, the number of floating point operations (FLOPs) is still large even though the memory footprint is small. In this work, we focus on building compute-efficient video classification models which process fewer frames and hence have less number of FLOPs. Similar to memory efficient models, we use the idea of distillation albeit in a different setting. Specifically, in our case, a compute-heavy teacher which looks at all the frames in the video is used to train a compute-efficient student which looks at only a small fraction of frames in the video. This is in contrast to a typical memory efficient Teacher-Student setting, wherein both the teacher and the student look at all the frames in the video but the student has fewer parameters. Our work thus complements the research on memory efficient video classification. We do an extensive evaluation with three types of models for video classification, viz., (i) recurrent models (ii) cluster-and-aggregate models and (iii) memory-efficient cluster-and-aggregate models and show that in each of these cases, a see-it-all teacher can be used to train a compute efficient see-very-little student. Overall, we show that the proposed student network can reduce the inference time by 30% and the number of FLOPs by approximately 90% with a negligent drop in the performance.
- Conference Article
1
- 10.24963/ijcai.2024/768
- Aug 1, 2024
Most edge-cloud collaboration frameworks rely on the substantial computational and storage capabilities of cloud-based artificial neural networks (ANNs). However, this reliance results in significant communication overhead between edge devices and the cloud, as well as high computational energy consumption, especially when applied to resource-constrained edge devices. To address these challenges, we propose ECC-SNN, a novel edge-cloud collaboration framework that incorporates energy-efficient spiking neural networks (SNNs) to offload more computational workload from the cloud to the edge, thereby improving cost-effectiveness and reducing reliance on the cloud. ECC-SNN employs a joint training approach that integrates ANN and SNN models, enabling edge devices to leverage knowledge from cloud models for enhanced performance while reducing energy consumption and processing latency. Furthermore, ECC-SNN features an on-device incremental learning algorithm that enables edge models to continuously adapt to dynamic environments, reducing the communication overhead and resource consumption associated with frequent cloud update requests. Extensive experimental results on four datasets demonstrate that ECC-SNN improves accuracy by 4.15%, reduces average energy consumption by 79.4%, and lowers average processing latency by 39.1%.
- Conference Article
21
- 10.1109/islped.2019.8824897
- Jul 1, 2019
Spiking Neural Networks (SNNs), which represent information as sequences of spikes, are gaining interest due to the emergence of low-power hardware platforms such as IBM TrueNorth and Intel Loihi, and their intrinsic ability to process temporal streams of data (e.g., outputs from event-based cameras). A spike produced by a neuron in an SNN is an event that triggers updates to the membrane potentials of each of the fanout neurons based on the weight associated with the synaptic connection, possibly resulting in other spikes being generated. The time and energy consumption in SNN implementations are dominated by accesses to the synaptic weights from memory and communication of spikes through the on-chip network. To improve the energy-efficiency of SNNs, we therefore propose Dynamic Spike Bundling (DSB), wherein an event to fanout neurons is not generated for every spike; instead, spikes produced by a neuron that occur close in time are dynamically bundled, with a single event being generated for the entire spike bundle. This reduces memory accesses as the synaptic weight can be fetched just once and reused across all spikes in the bundle. The communication traffic is also reduced as fewer messages are communicated between neurons.To evaluate DSB, we develop B-SNNAP, an event-driven SNN accelerator with hardware support for dynamically bundling spikes with minimal overheads. Across 7 image recognition benchmarks including CIFAR100 and ImageNet datasets, DSB achieves 1.15×-3.8× reduction in energy for <0.1% loss in accuracy, and upto 5.1× savings when <1% accuracy loss is tolerable.
- Research Article
5
- 10.1007/s13748-024-00313-4
- Mar 1, 2024
- Progress in Artificial Intelligence
Deep neural networks (DNNs) have received a great deal of interest in solving everyday tasks in recent years. However, their computational and energy costs limit their use on mobile and edge devices. The neuromorphic computing approach called spiking neural networks (SNNs) represents a potential solution for bridging the gap between performance and computational expense. Despite the potential benefits of energy efficiency, the current SNNs are being used with datasets such as MNIST, Fashion-MNIST, and CIFAR10, limiting their applications compared to DNNs. Therefore, the applicability of SNNs to real-world applications, such as scene classification and forecasting epileptic seizures, must be demonstrated yet. This paper develops a deep convolutional spiking neural network (DCSNN) for embedded applications. We explore a convolutional architecture, Visual Geometry Group (VGG16), to implement deeper SNNs. To train a spiking model, we convert the pre-trained VGG16 into corresponding spiking equivalents with nearly comparable performance to the original one. The trained weights of VGG16 were then transferred to the equivalent SNN architecture while performing a proper weight–threshold balancing. The model is evaluated in two case studies: land use and land cover classification, and epileptic seizure detection. Experimental results show a classification accuracy of 94.88%, and seizure detection specificity of 99.45% and a sensitivity of 95.06%. It is confirmed that conversion-based training SNNs are promising, and the benefits of DNNs, such as solving complex and real-world problems, become available to SNNs.
- Conference Article
6
- 10.1109/syscon53073.2023.10131076
- Apr 17, 2023
Spiking Neural Network (SNN) is a particular Artificial Neural Networks (ANN) form. An SNN has similar features as an ANN, but an SNN has a different information system that will allow SNN to have higher energy efficiency than an ANN. This paper presents the design and implementation of an SNN on FPGA. The model of the SNN is designed to be lower power consumption than existing SNN models in the aspect of FPGA implementation and lower accuracy loss than the existing training method in the part of the algorithm. The coding scheme of the SNN model proposed in this paper is the rate coding scheme. This paper introduces a conversion method to directly map the trained parameters from ANN to SNN with negligible classification accuracy loss. Also, this paper demonstrates the technique of FPGA implementation for Spiking Exponential Function, Spiking SoftMax Function and Dynamic Adder Tree. This paper also presents the Time Division Component Reuse technic for lower resource utilization in the FPGA implementation of SNN. The proposed model has a power efficiency of 8841.7 frames per watt with negligible accuracy loss. The benchmark SNN model has a power efficiency of 337.6 frames per watt with an accuracy loss of 1.42 percent. The reference accuracy of the ANN model is 90.36 percent. For comparison, the specific model of the SNN has an accuracy of 90.39 percent.
- Conference Article
- 10.1109/ikt57960.2022.10039008
- Dec 20, 2022
With artificial intelligence's tremendous progress in the past decades, the demand for applying artificial intelligence algorithms and architectures in cloud computing has increased. In this regard, the need for neuromorphic hardware that enables training and processing of data generated by edge devices has increased. Different algorithms have been presented in this direction, but they consume a lot of energy and space due to the large number of calculations. Therefore, researchers tried to minimize energy consumption while maintaining accuracy in deep spiking neural networks as the least consuming generation of neural networks. In order to achieve this goal and reduce the number of references to the required memory and space, they have provided various hardware and software methods. In this article, the best architecture is used by examining the amount of energy consumed and the accuracy of different methods of architecture. Also, a hybrid method is proposed to reduce energy consumption in spiking neural networks. The proposed hybrid architecture was implemented on the MNIST dataset, showing that the power consumption is reduced by almost 1% compared to the state of the art architectures. The accuracy of the proposed hybrid algorithm is 95.3%, which is the highest when compared to the architectures using the time-based coding.
- Research Article
3
- 10.1007/s44295-024-00040-5
- Sep 10, 2024
- Intelligent Marine Technology and Systems
Large language models are widely used across various applications owing to their superior performance. However, their high computational cost makes deployment on edge devices challenging. Spiking neural networks (SNNs), with their power-efficient, event-driven binary operations, offer a promising alternative. Combining SNNs and transformers is expected to be an effective solution for edge computing. This study proposes an energy-efficient spike transformer accelerator, which is the base component of the large language models, for edge computing, combining the efficiency of SNNs with the performance of transformer models. The design achieves performance levels comparable to traditional transformers while maintaining the lower power consumption characteristic of SNNs. To enhance hardware efficiency, a specialized computation engine and novel datapath for the spike transformer are introduced. The proposed design is implemented on the Xilinx Zynq UltraScale+ ZCU102 device, demonstrating significant improvements in energy consumption over previous transformer accelerators. It even surpasses some recent binary transformer accelerators in efficiency. Implementation results confirm that the proposed spike transformer accelerator is a feasible solution for running transformer models on edge devices.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.