Abstract
Masking is a well loved and widely deployed countermeasure against side channel attacks, in particular in software. Under certain assumptions (w.r.t. independence and noise level), masking provably prevents attacks up to a certain security order and leads to a predictable increase in the number of required leakages for successful attacks beyond this order. The noise level in typical processors where software masking is used may not be very high, thus low masking orders are not sufficient for real world security. Higher order masking however comes at a great cost, and therefore a number techniques have been published over the years that make such implementations more efficient via parallelisation in the form of bit or share slicing. We take two highly regarded schemes (ISW and Barthe et al.), and some corresponding open source implementations that make use of share slicing, and discuss their true security on an ARM Cortex-M0 and an ARM Cortex-M3 processor (both from the LPC series). We show that micro-architectural features of the M0 and M3 undermine the independence assumptions made in masking proofs and thus their theoretical guarantees do not translate into practice (even worse it seems unpredictable at which order leaks can be expected). Our results demonstrate how difficult it is to link theoretical security proofs to practical real-world security guarantees.
Highlights
Nowadays, masking is arguably one of the most prevalent countermeasures against side channel analysis
We demonstrate that the “independent leakage” assumption does not hold on an ARM Cortex M0 and M3
3Strictly speaking, it still depends on the specific leakage model: in practice, we found this approach provided a better signal-to-noise radio (SNR) in most cases
Summary
Nowadays, masking is arguably one of the most prevalent countermeasures against side channel analysis. Despite the fact that the authors did NOT directly link this scheme with any specific software or hardware implementation, due to its intrinsic parallel feature and good performance for higher-order masking [JS17], it has become a prevalent choice for a few bit-sliced masking implementations [JS17, GJRS18] These implementations stipulate that all shares of secret x should be stored within one register (denoted as “share-slicing” in this paper), which is a less-investigated option in the previous studies of software masking [JS17]. As the XOR operation has a trivial shared form, we can decompose the Sbox into such gadgets and apply our side channel protection in a divide-and-conquer manner By default, this approach fits the customized hardware better, as the proposed scheme is designed for 1-bit computing unit. Note that even in this case, as field computation is hardly feasible for the same bit-width of modern CPUs, in practice these schemes might still come with some bit-slicing effort [GR17]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have