Abstract
Software implementation of Hash function have not been able to offer satisfactory performances for various application thus far. Additionally, SHA-3 and SHAKE, which utilize SHA-3, are extensively utilized in many Post Quantum Cryptosystem (PQC) . Therefore, there is a need for research to optimize SHA-3 in the software environments. Our proposal involves an optimized software implementation of SHA-3 on a GPU environment. To improve performance efficiency, we suggest various techniques such as optimizing the internal processes of SHA-3, inline PTX optimization, efficient memory usage, and asynchronous CUDA stream application. After implementing these optimization methods, our SHA-3(512) (and SHA-3(256)) algorithm provides a maximum throughput of 88.51 Gb/s (and 171.62 Gb/s) on the RTX2080Ti GPU without CUDA stream. The proposal aims to optimize the software implementation of SHA-3 in a GPU environment to enhance performance efficiency. The suggested techniques include internal process optimization of SHA-3, inline PTX optimization, efficient memory usage, and the application of an asynchronous CUDA stream. After applying these optimization methods, our SHA-3(512) and SHA-3(256) algorithms provide a maximum throughput of 88.51 Gb/s and 171.62 Gb/s, respectively, on the RTX2080Ti GPU without CUDA stream.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have