High Throughput, low cost, Fully Pipelined Architecture for AES Crypto Chip

Nalini Iyer,D.V Poornaiah,P.V Anandmohan,V.D Kulkarni

doi:10.1109/indcon.2006.302814

Abstract

Reprogrammable devices such as field programmable gate arrays (FPGA's) are highly attractive options for hardware implementations of encryption algorithms. This paper proposes compact, memory less, high-speed hardware architectures for the Rijndael AES encryptor/decryptor, with combined data path, resource sharing and logic optimization for novel networking applications. Architectural optimization exploits the strength of pipelining, loop unrolling and sub-pipelining. Speed is increased by processing multiple rounds simultaneously at the cost of increased area. Algorithmic optimization exploits algorithmic strength inside each round unit. Various methods such as resource sharing and common sub expression elimination method for realizing various transformations in each round unit are presented to reduce the critical path and area issues between encryptor, and decryptor, advantage of sub-pipelining can be further explored by eliminating the unbreakable delay incurred by look-up tables in the conventional approaches, the widely used implementation of S-box, which uses combinational logic only. We explore the use of subfield arithmetic for efficient implementations of Galois Field arithmetic such as multiplication and inversion. Our technique involves mapping field elements to a composite field representation and a representation technique which minimizes the computation cost of the relevant arithmetic. Our method results in a very compact and fast gate circuit for Rijndael encryption and decryption. The pipelined architecture can be made to toggle between the encryption and decryption modes without the presence of any dead cycle. Using the proposed architecture, a fully sub-pipelined AES-128 core with both inner and outer round pipelining and a 5 sub-stages in each round unit implemented using Virtex-E devices can achieve a throughput of 26.64 Gbps at 206.84 MHz and 11720 CLB Slices in non-feedback modes with reduction of reconfigurable logic area of the complete cipher by up to 30%., and S-box with 64% reduction in area, which is faster and more efficient than the fastest previous FPGA implementation known to date

Full Text