The bootstrapping procedure has become the main bottleneck affecting the efficiency of all known fully homomorphic encryption (FHE) schemes. The state-of-the-art scheme for efficient bootstrapping, which is called fully homomorphic encryption over the torus (TFHE), accelerates polynomial multiplication by leveraging number theoretic transform (NTT) and implementing NTT in parallel on a GPU. Unfortunately, almost none of the recent advancements in NTT take full advantage of a GPU, leading to the need for more time. With this in mind, in this work, a novel faster number theoretic transform based on a GPU is proposed, in which matrix multiplication is used to implement a decomposed small-point NTT. When implementing matrix multiplication, we introduce a merging preprocessing method to merge multiple inputs of the small-point NTT, aiming to effectively minimize the count of modulo operations. Subsequently, when the merged result is multiplied by rotation factors, we use logical left shift rather than arithmetic multiplication to improve the computational efficiency. Our scheme can easily be used to realize a 1024-point NTT and the results of the experiments show that the speedup ratio of our method over the butterfly algorithm is about 2.49.
Read full abstract