IoT End-nodes Research Articles

Designing high-speed and energy-efficient blocks for image and digital signal processing (DSP) architecture is an evolving research field. This work designs a high-speed and energy-efficient multiply-accumulate (MAC) unit to augment the performance of field-programmable gate array (FPGA)-based accelerators and softcore processors. In this work, three discrete 32-bit fixed-point signed MAC architectures were designed in Verilog and synthesized for the Zynq 7000 ZedBoard to obtain efficient MAC architecture. The ultimate goal of this work is to design a fast and energy-efficient MAC unit that can achieve speed up to the DSP48 block to reduce the latency of IoT edge computing. Energy efficiency was achieved in PPG and partial product addition (PPA) for the proposed Booth radix-4 Dadda (BR4D)-based MAC. At PPG, the width of the partial product (PP) terms was optimized with Bewick’s signed extension to reduce the power consumption. At PPA, the number of PP rows reduces the critical path delay (CPD) with Dadda-based PPA. The proposed BR4D MAC unit offers a reduction in dynamic power, CPD, power-delay product (PDP) and energy-delay product (EDP) by 22%, 9%, 29% and 36%, respectively, compared to standard Booth radix-4 Wallace tree (BR4WT) based MAC. Furthermore, hybrid MACs (BR4WT and BR4D) were compared with the current state-of-the-art (SoA) designs, and it was found that the proposed BR4D MAC is 47% faster compared to the same design in SoA. The proposed BR4D was tested for frequency scaling technique by reducing the frequency in steps of 10 MHz from a maximum usable frequency (MUF) of 64 MHz to 10 MHz to evaluate the performance for low-power applications. Reducing clock frequency by 84% will reduce the power consumption at the same proportion and speed by 38%. Additionally, the proposed design helps to improve the battery life of IoT end nodes with a reduction in energy consumption and EDP by 76% and 61%, respectively.

FPGAs, because of their energy efficiency, reconfigurability, and easily tunable HLS designs, have been used to accelerate an increasing number of machine learning, especially CNN-based, applications. As a representative example, IoT Edge applications, which require low latency processing of resource-hungry CNNs, offload the inferences from resource-limited IoT end nodes to Edge servers featuring FPGAs. However, the ever-increasing number of end nodes pressures these FPGA-based servers with new performance and adaptability challenges. While some works have exploited CNN optimizations to alleviate inferences’ computation and memory burdens, others have exploited HLS to tune accelerators for statically defined optimization goals. However, these works have not tackled both CNN and HLS optimizations altogether; neither have they provided any adaptability at runtime, where the workload’s characteristics are unpredictable. In this context, we propose a hybrid two-step approach that, first, creates new optimization opportunities at design-time through the automatic training of CNN model variants (obtained via pruning) and the automatic generation of versions of convolutional accelerators (obtained during HLS synthesis); and, second, synergistically exploits these created CNN and HLS optimization opportunities to deliver a fully dynamic Multi-FPGA system that adapts its resources in a fully automatic or user-configurable manner. We implement this two-step approach as the AdaServ Framework and show, through a smart video surveillance Edge application as a case study, that it adapts to the always-changing Edge conditions: AdaServ processes at least 3.37× more inferences (using the automatic approach) and is at least 6.68× more energy-efficient (user-configurable approach) than original convolutional accelerators and CNN Models (VGG-16 and AlexNet). We also show that AdaServ achieves better results than solutions dynamically changing only the CNN model or HLS version, highlighting the importance of exploring both; and that it is always better than the best statically chosen CNN model and HLS version, showing the need for dynamic adaptability.

IoT End-nodes Research Articles

Articles published on IoT End-nodes

Designing Low-Power RISC-V Multicore Processors With a Shared Lightweight Floating Point Unit for IoT Endnodes

FPGA-Based High-Speed Energy-Efficient 32-Bit Fixed-Point MAC Architecture for DSP Application in IoT Edge Computing

Developing a Grover's quantum algorithm emulator on standalone FPGAs: optimization and implementation

Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes

Efficient Transform Algorithms for Parallel Ultra-Low-Power IoT End Nodes

ESco: Eligibility score-based strategy for sensors selection in CR-IoT: Application to LoRaWAN

Security Framework for IoT End Nodes with Neural Networks

Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

IoT End-nodes Research Articles

Articles published on IoT End-nodes

Designing Low-Power RISC-V Multicore Processors With a Shared Lightweight Floating Point Unit for IoT Endnodes

FPGA-Based High-Speed Energy-Efficient 32-Bit Fixed-Point MAC Architecture for DSP Application in IoT Edge Computing

Developing a Grover's quantum algorithm emulator on standalone FPGAs: optimization and implementation

Vega: A Ten-Core SoC for IoT Endnodes With DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes

Efficient Transform Algorithms for Parallel Ultra-Low-Power IoT End Nodes

ESco: Eligibility score-based strategy for sensors selection in CR-IoT: Application to LoRaWAN

Security Framework for IoT End Nodes with Neural Networks

Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine