Efficient Parallel Implementation of Matrix Multiplication for Lattice-Based Cryptography on Modern ARM Processor

Taehwan Park,Haeryong Park,Howon Kim,Junsub Kim,Hwajeong Seo

doi:10.1155/2018/7012056

Taehwan Park, Haeryong Park + Show 3 more

Open Access

https://doi.org/10.1155/2018/7012056

Copy DOI

Abstract

Recently, various types of postquantum cryptography algorithms have been proposed for the National Institute of Standards and Technology’s Postquantum Cryptography Standardization competition. Lattice-based cryptography, which is based on Learning with Errors, is based on matrix multiplication. A large-size matrix multiplication requires a long execution time for key generation, encryption, and decryption. In this paper, we propose an efficient parallel implementation of matrix multiplication and vector addition with matrix transpose using ARM NEON instructions on ARM Cortex-A platforms. The proposed method achieves performance enhancements of 36.93%, 6.95%, 32.92%, and 7.66%. The optimized method is applied to the Lizard. CCA key generation step enhances the performance by 7.04%, 3.66%, 7.57%, and 9.32% over previous state-of-the-art implementations.

Full Text