Atomistic spin dynamics simulations provide valuable information about the energy spectrum of magnetic materials in different phases, allowing one to identify instabilities and the nature of their excitations. However, the time cost of evaluating the spin-spin correlation functions in real space increases quadratically as the number of spins N, leading to significant computational effort, making the simulation of large spin systems very challenging. In this work, we propose to use a highly optimized general matrix multiply (GEMM) subroutine to calculate the dynamical spin-spin correlation function that can achieve near-optimal hardware utilization. Furthermore, we fuse the element-wise operations in the calculation of S(q,t) into the in-house GEMM kernel, which results in further performance improvements of 44% - 71% on several relatively large lattice sizes when compared to the implementation that uses the GEMM subroutine in OpenBLAS, which is the state-of-the-art open source library for Basic Linear Algebra Subroutine (BLAS).