ARM CPU Research Articles

Artificial Intelligence(AI) has achieved unprecedented success in various fields that include image, speech, or even video recognition. Most systems are implemented on power-hungry devices like CPU, GPU, or even TPU to process data due to the models' high computation and storage complexity. CPU platforms do weak in computation capacity, while energy budgets and expense of GPU and TPU are often not affordable to edge computing in the industrial business. Recently, the FPGA-based Neural Network (NN) accelerator has been a trendy topic in the research field. It is regarded as a promising solution to suppress GPU in both speed and energy efficiency with its specifically designed architecture. Our work performs on a low-end FPGA board, a more desirable platform in meeting the restrictions of energy efficiency and computational resource on an autonomous driving car. We propose a methodology that integrates a NN model into the board using HLS description in this paper. The whole design consists of algorithm-level downscaling and hardware optimization. The former emphasizes the model downscale through model pruning and binarization, which balance the model size and accuracy. The latter applies various HLS design techniques on each NN component, like loop unrolling, inter- /intra- level pipelining, and so on, to speed-up the application running on the target board. In the case study of tiny YOLO (You Only Look Once) v3, the model running on PYNQ-Z1 presents up to 22x acceleration comparing with the PYNQ's ARM CPU. Energy efficiency also achieves 3x better than Xeon E5-2667. To verify the flexibility of our methodology, we extend our work to the BinaryConnect and DoReFaNet. It is worth mentioning that the BinaryConnect even achieves around 100x acceleration comparing with it purely running on the PYNQ-Z1 ARM core.

The authors proposed hardware accelerator for the proof-of-work (PoW) operation in IOTA cryptocurrency. IOTA allows making secure and authenticated quantum resistant channels between IoT devices for communication and micropayments with no fees. Hardware acceleration reduces transaction time and increases throughput. In the article authors consider a basic theory of IOTA operations, generating and signing transaction with Winternitz one-time signatures. The authors describe operation principle of a new ternary hash function Curl. Winternitz one-time signatures and ternary hash function make IOTA quantum resistant. The core of IOTA is called Tangle and unlike Blockchain has Directed Acyclic Graph structure. There are no miners in the Tangle and IoT devices themselves maintain network operation, which leads to unlimited scalability and absence of fees. To add new transaction to the Tangle, IoT devices need to perform PoW operation for spam and Sybil attack protection — iteratively calculate Curl hash function for the IOTA transaction and change nonce field of the transaction until obtained result doesn’t satisfy given criteria, which is some amount of consecutive zero ternary values at the end of transaction hash. The software implementation of Curl hash function is very slow, the PoW operation on embedded devices can last up to 50 minutes, so the hardware acceleration of PoW operation is relevant task. In the proposed work authors created hardware accelerator for IOTA PoW operation. The structure and operation principle of accelerator is described. The proof-of-concept implementations was launched on DE10-nano board, based on Intel programmable logic chip. The proposed PoW hardware accelerator has parameterizable structure. It is possible manually set the number of PoW computing units by changing parameter value. In such parameterizable system one PoW computing unit is master and all remaining PoW units are slaves. Master PoW unit absorbs IOTA transaction, except nonce part, to midstate register utilizing sponge-like approach. Then all POW computing units (master and slaves) preload own state registers from midstate, randomly change personal nonces and start iterative search of valid nonce. When one of PoW computing units finds a valid nonce, PoW operation ends, nonce stored to destination buffer in SDRAM and interrupt is generated for ARM CPU. Final implementation of IOTA PoW hardware accelerator for DE10-nano board contains 11 PoW computing units, delivers 13.2 MH/s hash rate and gives x1000 speedup, compared to software implementation from IOTA developers, for only 30% of 5CSEBA6U23I7 programmable logic chip resources at 100 MHz clock frequency. The average PoW computation time in such implementation is 0.8 second.Ref. 14, img. 6.

ARM CPU Research Articles

Articles published on ARM CPU

A pipelining strategy for accelerating convolution neural networks on ARM CPUs

ReS2tAC-UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices.

Embedded real-time infrared and visible image fusion for UAV surveillance

SplitSR

Collaborative execution of fluid flow simulation using non-uniform decomposition on heterogeneous architectures

An implementation methodology for Neural Network on a Low-end FPGA Board

A data-driven approach to run agent-based multi-modal traffic simulations on heterogeneous CPU-GPU hardware

New parallel computing algorithm of molecular dynamics for extremely huge scale biological systems.

SEPAR: A New Lightweight Hybrid Encryption Algorithm with a Novel Design Approach for IoT

ThunderX2 Performance and Energy-Efficiency for HPC Workloads

Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices

Heterogeneous Real-Time Co-Emulation for Communication-Enabled Global Control of AC/DC Grid Integrated With Renewable Energy

First Steps in Porting the LFRic Weather and Climate Model to the FPGAs of the EuroExa Architecture

Exhaustive single bit fault analysis. A use case against Mbedtls and OpenSSL’s protection on ARM and Intel CPU

ARM-VO: an efficient monocular visual odometry for ground vehicles on ARM CPUs

Special-purpose computer for electroholography in embedded systems

Hardware Accelerator for Proof-Of-Work Operation in IOTA Cryptocurrency

A performance analysis of the first generation of HPC‐optimized Arm processors

Low-Cost Low-Power Acceleration of a Microwave Imaging Algorithm for Brain Stroke Monitoring

Data Exfiltration from Air-Gapped Computers based on ARM CPU

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

ARM CPU Research Articles

Articles published on ARM CPU

A pipelining strategy for accelerating convolution neural networks on ARM CPUs

ReS2tAC-UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices.

Embedded real-time infrared and visible image fusion for UAV surveillance

SplitSR

Collaborative execution of fluid flow simulation using non-uniform decomposition on heterogeneous architectures

An implementation methodology for Neural Network on a Low-end FPGA Board

A data-driven approach to run agent-based multi-modal traffic simulations on heterogeneous CPU-GPU hardware

New parallel computing algorithm of molecular dynamics for extremely huge scale biological systems.

SEPAR: A New Lightweight Hybrid Encryption Algorithm with a Novel Design Approach for IoT

ThunderX2 Performance and Energy-Efficiency for HPC Workloads

Energy-Efficient Acceleration of Deep Neural Networks on Realtime-Constrained Embedded Edge Devices

Heterogeneous Real-Time Co-Emulation for Communication-Enabled Global Control of AC/DC Grid Integrated With Renewable Energy

First Steps in Porting the LFRic Weather and Climate Model to the FPGAs of the EuroExa Architecture

Exhaustive single bit fault analysis. A use case against Mbedtls and OpenSSL’s protection on ARM and Intel CPU

ARM-VO: an efficient monocular visual odometry for ground vehicles on ARM CPUs

Special-purpose computer for electroholography in embedded systems

Hardware Accelerator for Proof-Of-Work Operation in IOTA Cryptocurrency

A performance analysis of the first generation of HPC‐optimized Arm processors

Low-Cost Low-Power Acceleration of a Microwave Imaging Algorithm for Brain Stroke Monitoring

Data Exfiltration from Air-Gapped Computers based on ARM CPU