Abstract

This paper aimed to implement both sequential and parallel implementations using CUDA on matrix multiplication to see the differences and effects of it, followed by an analysis of the result. We used the algorithm as that will be elaborated more on the paper, here we would like to generally compare its memory consumption and run time. It is found out that parallel implementation runs faster on an average of 31.23 compared to sequential implementation running the same task. This faster result on parallel programming comes with a trade-off that it consumes more memory rather than sequential implementation. Optimum threads to be used in parallel programming is also needed to be found, here we try to find it with trial-and-error testing. Further research would involve more complex parallel programming implementation and a more controlled testing environment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call