32-bit Instruction Research Articles

In this paper, we propose the first ARIA block cipher on both MSP430 and Advanced RISC Machines (ARM) microcontrollers. To achieve the optimized ARIA implementation on target embedded processors, core operations of ARIA, such as substitute and diffusion layers, are carefully re-designed for both MSP430 (Texas Instruments, Dallas, TX, USA) and ARM Cortex-M3 microcontrollers (STMicroelectronics, Geneva, Switzerland). In particular, two bytes of input data in ARIA block cipher are concatenated to re-construct the 16-bit wise word. The 16-bit word-wise operation is executed at once with the 16-bit instruction to improve the performance for the 16-bit MSP430 microcontroller. This approach also optimizes the number of required registers, memory accesses, and operations to half numbers rather than 8-bit word wise implementations. For the ARM Cortex-M3 microcontroller, the 8×32 look-up table based ARIA block cipher implementation is further optimized with the novel memory access. The memory access is finely scheduled to fully utilize the 3-stage pipeline architecture of ARM Cortex-M3 microcontrollers. Furthermore, the counter (CTR) mode of operation is more optimized through pre-computation techniques than the electronic code book (ECB) mode of operation. Finally, proposed ARIA implementations on both low-end target microcontrollers (MSP430 and ARM Cortex-M3) achieved (209 and 96 for 128-bit security level, respectively), (241 and 111 for 192-bit security level, respectively), and (274 and 126 for 256-bit security level, respectively). Compared with previous works, the running timing on low-end target microcontrollers (MSP430 and ARM Cortex-M3) is improved by (92.20% and 10.09% for 128-bit security level, respectively), (92.26% and 10.87% for 192-bit security level, respectively), and (92.28% and 10.62% for 256-bit security level, respectively). The proposed ARIA–CTR implementation improved the performance by 6.6% and 4.0% compared to the proposed ARIA–ECB implementations for MSP430 and ARM Cortex-M3 microcontrollers, respectively.

Read full abstract

Code discovery has been a main challenge for static binary translation, especially when the source instruction set architecture has variable-length instructions, such as the x86 architectures. Due to embedded data such as PC (program counter)-relative data, jump tables, or paddings in the code section, a binary translator may be misled to translate data as instructions. For variable-length instructions, once a piece of data is mis-translated as instructions, decoding subsequent bytes could also go wrong. We are concerned with static binary translation for the very popular Advanced RISC Machine (ARM) architectures. Although ARM is considered a reduced instruction set computer architecture, it does allow the mix of 32-bit (ARM) instructions and 16-bit (Thumb) instructions in the same executables. In addition to different instruction lengths, the ARM and Thumb instructions are located at 4-byte or 2-byte aligned addresses, respectively. Furthermore, because ARM and Thumb instructions share the same encoding space, a 4-byte word could sometimes be decoded as one ARM instruction or two Thumb instructions. The correct decoding of this 4-byte word is actually determined at runtime by the least-significant bit of the program counter. For unstripped binaries, the mapping symbols can be used to identify ARM code regions and Thumb code regions. However, for stripped binaries, such mapping symbols are unavailable. We propose a novel solution to statically translate stripped ARM/Thumb mixed executables. Our solution is implemented in a static binary translator. The binary translator further generates multiple versions of translated code for the code regions whose types cannot be determined with our solution. One of the code versions is selected during runtime. The binary translator also includes a series of analyses that enable the removal of most useless code versions. Based on the experimental results on stripped ARM/Thumb mixed binaries in the SPEC2006 and Embedded Microprocessor Benchmark Consortium (EEMBC) benchmark suites, our static binary translator achieves impressive performance when migrating them to run on x86 machines and the space overhead is no more than 10%.

Read full abstract

32-bit Instruction Research Articles

Related Topics

Articles published on 32-bit Instruction

A Convolutional Neural Network Accelerator Based on FPGA

Auto-tuning Fixed-point Precision with TVM on RISC-V Packed SIMD Extension

An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

An Efficient Unstructured Sparse Convolutional Neural Network Accelerator for Wearable ECG Classification Device

The Renesas Automotive Story in the History of the Microprocessor

Compact Implementation of ARIA on 16-Bit MSP430 and 32-Bit ARM Cortex-M3 Microcontrollers

A Review of ARM Processor Architecture History, Progress and Applications

A Review of ARM Processor Architecture History, Progress and Applications

RVCoreP: An Optimized RISC-V Soft Processor of Five-Stage Pipelining

An integrated machine code monitor for a RISC-V processor on an FPGA

Design and Implementation of a 256-Bit RISC-V-Based Dynamically Scheduled Very Long Instruction Word on FPGA

Modern microprocessor built from complementary carbon nanotube transistors.

A reliable PUF in a dual function SRAM

A low-cost synthesizable RISC-V dual-issue processor core leveraging the compressed Instruction Set Extension

Reducing calling convention overhead in object-oriented programming on embedded ARM thumb-2 platforms

Design and implementation of an ASIP-based cryptography processor for AES, IDEA, and MD5

Floating accumulator architecture

On Static Binary Translation of ARM/Thumb Mixed ISA Binaries

낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현

Simple super-matrix processor: Implementation and performance evaluation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

32-bit Instruction Research Articles

Related Topics

Articles published on 32-bit Instruction

A Convolutional Neural Network Accelerator Based on FPGA

Auto-tuning Fixed-point Precision with TVM on RISC-V Packed SIMD Extension

An Approach for Matrix Multiplication of 32-Bit Fixed Point Numbers by Means of 16-Bit SIMD Instructions on DSP

An Efficient Unstructured Sparse Convolutional Neural Network Accelerator for Wearable ECG Classification Device

The Renesas Automotive Story in the History of the Microprocessor

Compact Implementation of ARIA on 16-Bit MSP430 and 32-Bit ARM Cortex-M3 Microcontrollers

A Review of ARM Processor Architecture History, Progress and Applications

A Review of ARM Processor Architecture History, Progress and Applications

RVCoreP: An Optimized RISC-V Soft Processor of Five-Stage Pipelining

An integrated machine code monitor for a RISC-V processor on an FPGA

Design and Implementation of a 256-Bit RISC-V-Based Dynamically Scheduled Very Long Instruction Word on FPGA

Modern microprocessor built from complementary carbon nanotube transistors.

A reliable PUF in a dual function SRAM

A low-cost synthesizable RISC-V dual-issue processor core leveraging the compressed Instruction Set Extension

Reducing calling convention overhead in object-oriented programming on embedded ARM thumb-2 platforms

Design and implementation of an ASIP-based cryptography processor for AES, IDEA, and MD5

Floating accumulator architecture

On Static Binary Translation of ARM/Thumb Mixed ISA Binaries

낮은 복잡도의 Deeply Embedded 중앙처리장치 및 시스템온칩 구현

Simple super-matrix processor: Implementation and performance evaluation