Abstract
In this paper we propose a parallel implementation of a Voronoi cell-based algorithm for the Shortest Vector Problem for both CPU and GPU architectures. Additionally, we present an algorithmic simplification with particular emphasis on significantly reducing the memory usage of the implementation. According to our tests, the parallel multi-core CPU implementation scales linearly with the number of cores used, and also benefits from simultaneous multi-threading, achieving a maximum speedup of $ 5.56\times$ for 8 threads. The parallel GPU implementation obtains speedups of $ 13.08\times$, compared with the sequential CPU implementation. The acceleration of this class of signal processing algorithms is a fundamental step in the evolution of post-quantum cryptanalysis. Currently, the best algorithms can take months to process for moderately low dimensions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have