Abstract

Multi-precision multiplication is one of the most fundamental operations on microprocessors to allow public-key cryptography such as RSA and elliptic curve cryptography (ECC). In this paper, we present a novel multiplication technique that increases the performance of multiplication by sophisticated caching of operands. Our method significantly reduces the number of needed load instructions which is usually one of the most expensive operations on modern processors. We evaluate our new technique on an 8-bit ATmega128 and a 32-bit ARM7TDMI microcontroller and compare the results with existing solutions. For the ATmega128, our implementation needs only 2395 clock cycles for a 160-bit multiplication. The number of required load instructions is reduced from 167 (needed for the best known hybrid multiplication) to only 80. On the ARM7TDMI, our implementation needs only 281 clock cycles as opposed to 357. For both platforms, the proposed technique outperforms related work by a factor of about 10–23%. We also show that the method scales very well even for larger Integer sizes (required for RSA) and limited register sets. It fully complies with existing multiply–accumulate instructions that are integrated in most of the available processors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call