Abstract
Recently many studies have been made to map cryptography algorithms onto graphics processors (GPU), and gained great performances. This paper does not focus on the performance of a specific program exploited by using all kinds of optimization methods algorithmically, but the intrinsic reason which lies in GPU architectural features for this performance improvement. Thus we present a study of several block encryption algorithms(AES, TRI-DES, RC5, TWOFISH and the chained block cipher formed by their combinations) processing on GPU using CUDA. We introduce our CUDA implementations, and investigate the program behavioral characteristics and their impacts on the performance in four aspects. We find that the number of threads used by a CUDA program can affect the overall performance fundamentally. Many block encryption algorithms can benefit from the shared memory if the capacity is large enough to hold the lookup tables. The data stored in device memory should be organized purposely to avoid performance degradation. Besides, the communication between host and device may turn out to be the bottleneck of a program. Through these analyses we hope to find out an effective way to optimize a CUDA program, as well as to reveal some desirable architectural features to support block encryption applications better.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.