In Internet of Things services, various types of embedded devices are employed. Among them, ARM-based devices have been widely used as clients. Since these devices communicate with each other in wirelessly, transmitted data needs to be protected with secure block ciphers. Recently, several Add-Rotate-XOR (ARX)-based block ciphers, such as HIGHT and revised CHAM, have been developed for efficient encryption on embedded devices. In this paper, we present secure and fast implementations of ARX-based block ciphers HIGHT and revised CHAM in ARMv8 platforms. For performance efficiency, we basically apply task and data parallel processing mechanism by fully utilizing NEON architecture embedded in ARMv8 platforms. Typically, it is required to duplicate round key in NEON register to utilize the NEON architecture to process multiple data blocks simultaneously. In our implementations, we propose an optimal approach minimizing the cost of round key duplication and efficient key scheduling for task parallelism. For secure implementation, we develop efficient software countermeasures against realistic fault attack models. Thus, we present efficient software countermeasure based on intra-instruction redundancy. Especially, we propose enhanced random shuffling method which is the core operation for the proposed countermeasure. With the proposed random shuffling method, we can significantly reduce the overhead for preventing fault attacks. We present two versions of the software: a version providing highly fast (HF) performance without fault attack countermeasures and a version providing highly secure (HS) against fault attacks. Compared with referenced software, HF with HIGHT, revised CHAM-64/128, CHAM-128/128, and CHAM-128/256 provides about 8 times, 38 times, 13 times and 13 times of enhanced performance, respectively. Compared with previous best results having fault attack countermeasure, HS with HIGHT, revised CHAM-64/128, CHAM-128/128, and CHAM-128/256 provides about 50%, 30%, 80%, and 70% of enhanced performance, respectively. Both our HS and HF achieve better performance and higher security compared with related works.
Read full abstract