Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data

Sangwoo An,Seog Chung Seo

doi:10.3390/app10113711

Abstract

With the advent of IoT and Cloud computing service technology, the size of user data to be managed and file data to be transmitted has been significantly increased. To protect users’ personal information, it is necessary to encrypt it in secure and efficient way. Since servers handling a number of clients or IoT devices have to encrypt a large amount of data without compromising service capabilities in real-time, Graphic Processing Units (GPUs) have been considered as a proper candidate for a crypto accelerator for processing a huge amount of data in this situation. In this paper, we present highly efficient implementations of block ciphers on NVIDIA GPUs (especially, Maxwell, Pascal, and Turing architectures) for environments using massively large data in IoT and Cloud computing applications. As block cipher algorithms, we choose AES, a representative standard block cipher algorithm; LEA, which was recently added in ISO/IEC 29192-2:2019 standard; and CHAM, a recently developed lightweight block cipher algorithm. To maximize the parallelism in the encryption process, we utilize Counter (CTR) mode of operation and customize it by using GPU’s characteristics. We applied several optimization techniques with respect to the characteristics of GPU architecture such as kernel parallelism, memory optimization, and CUDA stream. Furthermore, we optimized each target cipher by considering the algorithmic characteristics of each cipher by implementing the core part of each cipher with handcrafted inline PTX (Parallel Thread eXecution) codes, which are virtual assembly codes in CUDA platforms. With the application of our optimization techniques, in our implementation on RTX 2070 GPU, AES and LEA show up to 310 Gbps and 2.47 Tbps of throughput, respectively, which are 10.7% and 67% improved compared with the 279.86 Gbps and 1.47 Tbps of the previous best result. In the case of CHAM, this is the first optimized implementation on GPUs and it achieves 3.03 Tbps of throughput on RTX 2070 GPU.

Highlights

With the development of IoT and Cloud computing service technology, the amount of data produced by many users and growing IT devices has increased significantly
In the case of inline PTX, the available operation differs according to the Graphic Processing Units (GPUs) architecture, and the computations available above the Kepler architecture were mainly used
The CUDA Toolkit works on the basis of the C programming language, and the presented CUDA version is compatible with the GPU architecture of all platforms used in the experiment

Summary

Introduction

With the development of IoT and Cloud computing service technology, the amount of data produced by many users and growing IT devices has increased significantly. Unlike the latest CPUs that have dozens of cores, GPUs have hundreds to thousands of cores, making them suitable for computation-intensive works such as cryptographic operations Block cipher algorithms such as AES [1], CHAM [2], and LEA [3] have been optimized in the GPU environment [4,5,6,7,8,9,10,11,12]. Optimizing the lightweight block cipher algorithm on the GPU needs be studied considering the IoT server environment. Since devices that communicate through the network are becoming smaller, we present the possibility of applying large-capacity data encryption by optimizing computation-based cipher algorithms that do not use tables.

Related Works

Background

Overview of Target Algorithms

Counter Modes of Operation

GPU Architecture Overview

Target GPU Architecture

Proposed Implementation Techniques

Parallel Encryption in CTR Mode

Common Memory Management in GPU Kernel

Memory Optimization Technique in AES

Memory Optimization Technique in CHAM and LEA

Pipelining Encryption with CUDA Stream

Application of Inline PTX Assembly Codes

Experiment Environment

Experiment Results

Comparison

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 27, 2020
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Parallel Implementations of ARX-Based Block Ciphers on Graphic Processing Units
Sangwoo An ... Youngbeom Kim
Mathematics | VOL. 8
Sangwoo An, et. al.Sangwoo An ... Youngbeom Kim
31 Oct 2020
Mathematics | VOL. 8

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
Scott Rostrup ... Hans De Sterck
Computer Physics Communications | VOL. 181
Scott Rostrup, et. al.Scott Rostrup ... Hans De Sterck
26 Aug 2010
Computer Physics Communications | VOL. 181

Parallelizing Graph Algorithms on GPU for Optimization
...
-
, et. al. ...
28 Jul 2015
28 Jul 2015

GPU-accelerated multitiered iterative phasing algorithm for fluctuation X-ray scattering.
Pranay Reddy Kommera ... Christine Sweeney
Journal of applied crystallography | VOL. 54
Pranay Reddy Kommera, et. al.Pranay Reddy Kommera ... Christine Sweeney
30 Jul 2021
Journal of applied crystallography | VOL. 54

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences