Abstract

Recently, parallel computation has become necessary to take full advantage of the gains allowed by Moore’s law. Many scientific and engineering applications exhibit data parallelism but might not make full use of it. Some ubiquitous operations such that the dot product can easily be parallelized and then make good use of available hardware, like multi-core or GPU. In this paper, we provide two slightly different algorithms to perform dot product calculations in a finite field using floating-point arithmetic and implement them on the GPU architecture. To do so, we pack input integers into floating-point numbers and exploit the computational capabilities of GPU to their full extent to get the result efficiently. Using error-free transformations, we show that it is possible to reach speedups between 10 or 40 with the parallel versions, with an algorithm using nearly no modular reduction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call