Self-calibrating GRAPPA operator gridding (SC-GROG) is a method by which non-Cartesian (NC) data in magnetic resonance imaging (MRI) are shifted to the Cartesian k-space grid locations using the parallel imaging concept of GRAPPA operator. However, gridding with SC-GROG becomes computationally expensive and leads to longer reconstruction time when mapping a large number of NC samples in MRI data to the nearest Cartesian grid locations. This work aims to accelerate the SC-GROG for radial acquisitions in MRI, using massively parallel architecture of graphics processing units (GPUs). For this purpose, a novel implementation of GPU-accelerated SC-GROG is presented, which exploits the inherent parallelism in gridding operations. The proposed method employs the look-up-table (LUT)-based optimized kernels of compute unified device architecture (CUDA), to pre-calculate all the possible combinations of 2D-gridding weight sets and uses appropriate weight sets to shift the NC signals from multi-channel receiver coils at the nearest Cartesian grid locations. In the proposed method, LUTs are implemented to avoid the race condition among the CUDA kernel threads while shifting various NC points to the same Cartesian grid location. Several experiments using 24-channel simulated phantom and (12 and 30 channel) in vivo data sets are performed to evaluate the efficacy of the proposed method in terms of computation time and reconstruction accuracy. The results show that the GPU-based implementation of SC-GROG can significantly improve the image reconstruction efficiency, typically achieving 6× to 30× speed-up (including transfer time between CPU and GPU memory) without compromising the quality of image reconstruction.