Power Overhead Research Articles

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight cores that support domain-specific instructions, and a router-based memory access architecture that helps with efficient implementation of layers in binary precision weight/activation neural networks of proper size. With only 3.73% and 1.98% area and average power overhead, respectively, novel instructions such as Combined Population-Count-XNOR , Patch-Select , and Bit-based Accumulation are added to the instruction set architecture of the BiNMAC, each of which replaces execution cycles of frequently used functions with 1 clock cycle that otherwise would have taken 54, 4, and 3 clock cycles, respectively. Additionally, customized logic is added to every core to transpose 16×16-bit blocks of memory on a bit-level basis, that expedites reshaping intermediate data to be well-aligned for bitwise operations. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65-nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm 2 with an average power of 232 mW at 1-GHz clock frequency and 1.1 V. The 64-cluster architecture takes 36.5 mm 2 area and, if fully exploited, consumes a total power of 16.4 W and can perform 1,360 Giga Operations Per Second (GOPS) while providing full programmability. To demonstrate its scalability, four binarized case studies including ResNet-20 and LeNet-5 for high-performance image classification, as well as a ConvNet and a multilayer perceptron for low-power physiological applications were implemented on BiNMAC. The implementation results indicate that the population-count instruction alone can expedite the performance by approximately 5×. When other new instructions are added to a RISC machine with existing population-count instruction, the performance is increased by 58% on average. To compare the performance of the BiNMAC with other commercial-off-the-shelf platforms, the case studies with their double-precision floating-point models are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). The results indicate that, within a margin of ∼2.1%--9.5% accuracy loss, BiNMAC on average outperforms the TX2 GPU by approximately 1.9× (or 7.5× with fabrication technology scaled) in energy consumption for image classification applications. On low power settings and within a margin of ∼3.7%--5.5% accuracy loss compared to ARM Cortex-A57 CPU implementation, BiNMAC is roughly ∼9.7×--17.2× (or 38.8×--68.8× with fabrication technology scaled) more energy efficient for physiological applications while meeting the application deadline.

Read full abstract

Blockchain has received a lot of attention recently for its characteristics of decentralization, immutability, traceability, etc., making it a promising technology for the development of various applications, especially the management of various digital information. However, most current blockchain systems exhibit problems such as high computational overhead and centralization of power. Reliance on cryptocurrency in many public blockchain-based applications is another factor that has hindered the application of blockchain technology in areas other than the financial sectors. This paper proposes a new blockchain consensus mechanism based on the contributions of participants. The proposed consensus mechanism, which is called proof-of-contribution (PoC), quantifies user behaviors and actions in a blockchain-based application as contribution values calculated through an algorithm. The node that has the highest contribution value in each round of consensus gets the right to generate the next new block. PoC preserves the properties of decentralization and resistance to hard fork and does not rely on cryptocurrency, making it attractive over cryptocurrency-based consensus mechanisms like proof-of-work (PoW) for a wide variety of applications that do not have to involve cryptocurrency. Contribution values can be abstracted from applications and used in the underlying blockchain consensus process to improve the security and trustworthiness of the applications. Intellectual property (IP) protection is one such application to apply the blockchain technology and the PoC consensus mechanism. However, existing blockchain-based IP protection systems are mostly developed based on public blockchain platforms such as the Ethereum while some others are developed as consortium blockchains, thus exhibiting inherent defects such as high system requirements, long consensus delay, and low user participation. In this paper, we will use IP protection as the application scenario to illustrate the development of our PoC consensus mechanism and will compare PoC to some existing consensus mechanisms. Experimental results will show that the proposed PoC consensus mechanism preserves most of the main security characteristics of blockchain and is superior to existing consensus mechanisms, making it more secure and efficient to use blockchain technology for digital information management.

Read full abstract

Power Overhead Research Articles

Related Topics

Articles published on Power Overhead

A 3.66 μW 12-bit 1 MS/s SAR ADC with mismatch and offset foreground calibration

Slope-assisted Based Fast Strain Measurement Method for Power Overhead Lines of Distribution Internet of Things in Electricity

A High-Linearity Adaptive-Bias SiGe Power Amplifier for 5G Communication

A Radiation-Hardened CMOS Full-Adder Based on Layout Selective Transistor Duplication

Low mismatch high-speed charge pump for high bandwidth phase locked loops

CONCEALING-Gate: Optical Contactless Probing Resilient Design

Halide perovskite memristors as flexible and reconfigurable physical unclonable functions

Data-Level Parallelism Oriented Memory Access and On-Chip Buffering Mechanisms for a Loop Accelerator

Optimal Multi-Operation Energy Management in Smart Microgrids in the Presence of RESs Based on Multi-Objective Improved DE Algorithm: Cost-Emission Based Optimization

Design and analysis of high performance and low power FFT for DSP datapath using Vedic Multipliers

Binary Precision Neural Network Manycore Accelerator

Opportunistic Caching in NoC: Exploring Ways to Reduce Miss Penalty

Frequency-Domain-Multiplexing Single-Wire Interface and Harmonic-Rejection-Based IF Data De-Multiplexing in Millimeter-Wave MIMO Arrays

DYRE: a DYnamic REconfigurable solution to increase GPGPU\u2019s reliability

A holistic approach to power efficiency in a clock offset based Intrusion Detection Systems for Controller Area Networks

Security Assessment of Dynamically Obfuscated Scan Chain Against Oracle-guided Attacks

Real-Time Error Detection in Nonlinear Control Systems Using Machine Learning Assisted State-Space Encoding

Freezer: A Specialized NVM Backup Controller for Intermittently Powered Systems

Proof-of-Contribution consensus mechanism for blockchain and its application in intellectual property protection

Digitally Assisted Secondary Switch-and-Compare Technique for a SAR ADC

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Power Overhead Research Articles

Related Topics

Articles published on Power Overhead

A 3.66 μW 12-bit 1 MS/s SAR ADC with mismatch and offset foreground calibration

Slope-assisted Based Fast Strain Measurement Method for Power Overhead Lines of Distribution Internet of Things in Electricity

A High-Linearity Adaptive-Bias SiGe Power Amplifier for 5G Communication

A Radiation-Hardened CMOS Full-Adder Based on Layout Selective Transistor Duplication

Low mismatch high-speed charge pump for high bandwidth phase locked loops

CONCEALING-Gate: Optical Contactless Probing Resilient Design

Halide perovskite memristors as flexible and reconfigurable physical unclonable functions

Data-Level Parallelism Oriented Memory Access and On-Chip Buffering Mechanisms for a Loop Accelerator

Optimal Multi-Operation Energy Management in Smart Microgrids in the Presence of RESs Based on Multi-Objective Improved DE Algorithm: Cost-Emission Based Optimization

Design and analysis of high performance and low power FFT for DSP datapath using Vedic Multipliers

Binary Precision Neural Network Manycore Accelerator

Opportunistic Caching in NoC: Exploring Ways to Reduce Miss Penalty

Frequency-Domain-Multiplexing Single-Wire Interface and Harmonic-Rejection-Based IF Data De-Multiplexing in Millimeter-Wave MIMO Arrays

DYRE: a DYnamic REconfigurable solution to increase GPGPU\u2019s reliability

A holistic approach to power efficiency in a clock offset based Intrusion Detection Systems for Controller Area Networks

Security Assessment of Dynamically Obfuscated Scan Chain Against Oracle-guided Attacks

Real-Time Error Detection in Nonlinear Control Systems Using Machine Learning Assisted State-Space Encoding

Freezer: A Specialized NVM Backup Controller for Intermittently Powered Systems

Proof-of-Contribution consensus mechanism for blockchain and its application in intellectual property protection

Digitally Assisted Secondary Switch-and-Compare Technique for a SAR ADC