The Advanced Encryption Standard (AES) is a standardized block cipher widely used to protect data confidentiality. Besides that, it can be used to generate pseudo-random numbers, which has many important applications. Recently, several works demonstrated the efficient implementations of AES electronics code book (ECB) and counter (CTR) mode on GPU platforms, achieving high throughput. In this brief, we set a speed record of AES implementation, which outperformed previous implementations. In particular, the proposed AES implementation achieved throughput 9% (CTR) and 7% (ECB) higher than the state-of-the-art, bit-sliced implementation. Moreover, the proposed technique does not require round keys to be embedded into the code during compilation, which is a serious limitation found in earlier work. The proposed technique also achieved up to 63% higher throughput compared to another technique presented recently. Two use cases are presented here to verify the efficiency of the proposed AES implementation. Firstly, AES is used to generate random samples in a NIST post-quantum key encapsulation mechanism (KEM), achieving 3,350, 1,503 and 7,716 key exchanges per second on V100, T4, and RTX3080 GPUs respectively. This allows the proposed FrodoKEM implementation to be <inline-formula> <tex-math notation="LaTeX">$2.99\times $ </tex-math></inline-formula> faster than the state-of-the-art performance. The proposed AES implementation was also used in an exhaustive key search application, achieving 11,428, 3,969, and 9,998 <inline-formula> <tex-math notation="LaTeX">$\times 10^{6}$ </tex-math></inline-formula> encryptions per second on V100, T4, and RTX3080 GPUs, respectively.
Read full abstract