Highly Vectorized SIKE for AVX-512

Hao Cheng,Peter Y A Ryan,Georgios Fotiadis,Johann Großschädl

doi:10.46586/tches.v2022.i2.41-68

Abstract

It is generally accepted that a large-scale quantum computer would be capable to break any public-key cryptosystem used today, thereby posing a serious threat to the security of the Internet’s public-key infrastructure. The US National Institute of Standards and Technology (NIST) addresses this threat with an open process for the standardization of quantum-safe key establishment and signature schemes, which is now in the final phase of the evaluation of candidates. SIKE (an abbreviation of Supersingular Isogeny Key Encapsulation) is one of the alternate candidates under evaluation and distinguishes itself from other candidates due to relatively short key lengths and relatively high computing costs. In this paper, we analyze how the latest generation of Intel’s Advanced Vector Extensions (AVX), in particular AVX-512IFMA, can be used to minimize the latency (resp. maximize the hroughput) of the SIKE key encapsulation mechanism when executed on Ice Lake CPUs based on the Sunny Cove microarchitecture. We present various techniques to parallelize and speed up the base/extension field arithmetic, point arithmetic, and isogeny computations performed by SIKE. All these parallel processing techniques are combined in AvxSike, a highly optimized implementation of SIKE using Intel AVX-512IFMA instructions. Our experiments indicate that AvxSike instantiated with the SIKEp503 parameter set is approximately 1.5 times faster than the to-date best AVX-512IFMA-based SIKE software from the literature. When executed on an Intel Core i3-1005G1 CPU, AvxSike outperforms the x64 assembly implementation of SIKE contained in Microsoft’s SIDHv3.4 library by a factor of about 2.5 for key generation and decapsulation, while the encapsulation is even 3.2 times faster.

Highlights

In 2016, the National Institute of Standards and Technology (NIST) became engaged in Post-Quantum Cryptography (PQC) and started an initiative to solicit, evaluate, and standardize quantum-safe public-key cryptographic algorithms [CJL+16]
Due to our optimizations for encapsulation, AvxSike-LL reaches a 3.2-fold higher encapsulation speed compared to SIDHv3.4, which can be beneficial for e.g. server-side TLS processing since, when Supersingular Isogeny Key Encapsulation (SIKE) is integrated into TLS, the server has to perform encapsulations
By developing sophisticated vector processing techniques for field arithmetic, point arithmetic, and isogeny computations, all of which are integrated into our AvxSike software, we were able to significantly improve both the latency and the throughput of SIKE on modern Intel processors

Summary

Introduction

In 2016, the NIST became engaged in Post-Quantum Cryptography (PQC) and started an initiative to solicit, evaluate, and standardize quantum-safe public-key cryptographic algorithms [CJL+16]. There exists currently only one publication dealing with AVX-512 optimizations for SIKE, namely the ARITH 2019 paper of Kostic and Gueron [KG19], but their work focuses solely on the low-level field arithmetic, i.e. they did not explore avenues for parallel processing at the higher levels of SIKE It is still unknown how AVX-512IFMA can be exploited to unleash the full potential of modern Intel processors for executing SIKE and what latency At the highest layer, we discuss various approaches for vectorized isogeny computation and key encapsulation All these parallel processing techniques are combined in AvxSike, an optimized implementation of SIKE using Intel’s AVX-512IFMA instructions. Our latency-optimized AvxSike instantiated with the SIKEp503 parameters is about 1.5 times faster than the AVX-512IFMA-based SIKE software presented in [KG19] It outperforms Microsoft’s x64 Assembler implementation of SIKE by a factor of about 2.5 for both key generation and decapsulation, and even 3.2 for encapsulation, when benchmarked on an Intel Core i3-1005G1 processor.

Preliminaries

Optimized Isogeny Computations

Intel Advanced Vector Extension AVX-512

Prime-Field Arithmetic

Radix-251 Representation

Results and Comparison

Quadratic Extension-Field Arithmetic

Fp2-Multiplication

Fp2-Squaring

Fp2-Addition and Subtraction

Montgomery Elliptic Curve Arithmetic

Three-Point Ladder

Point Doubling and Tripling

Isogeny Generation

Isogeny Evaluation

Low-Latency Implementation

High-Throughput Implementation

Experimental Results

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IACR Transactions on Cryptographic Hardware and Embedded Systems	Publication Date: Feb 15, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Highly Vectorized SIKE for AVX-512

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IACR Transactions on Cryptographic Hardware and Embedded Systems

Lead the way for us

Similar Papers

High-Performance Systolic Array Montgomery Multiplier for SIKE
Ziying Ni ... Dur-E-Shahwar Kundi
-
Ziying Ni, et. al.Ziying Ni ... Dur-E-Shahwar Kundi
01 May 2021
01 May 2021

Compressed SIKE Round 3 on ARM Cortex-M4
Mila Anastasova ... Mojtaba Bisheh-Niasar
-
Mila Anastasova, et. al.Mila Anastasova ... Mojtaba Bisheh-Niasar
01 Jan 2020
01 Jan 2020

Optimized SIKE Round 2 on 64-bit ARM
Hwajeong Seo ... Amir Jalali
-
Hwajeong Seo, et. al.Hwajeong Seo ... Amir Jalali
01 Jan 2020
01 Jan 2020

SIKE on GPU: Accelerating Supersingular Isogeny-Based Key Encapsulation Mechanism on Graphic Processing Units
Seog Chung Seo
IEEE access : practical innovations, open solutions | VOL. 9
Seog Chung SeoSeog Chung Seo
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Highly Vectorized SIKE for AVX-512

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IACR Transactions on Cryptographic Hardware and Embedded Systems