The new edge-directed interpolation (NEDI) algorithm is non-iterative and orientation-adaptive. It achieves better edge performance in enhancing remote sensing images and natural images than conventional bi-linear and bi-cubic methods. It is also the theoretical foundation of many other complex regression and auto-regression interpolation methods. Although NEDI has impressive performance, its computation complexity is an obstacle to large-scale expansion. Parallel acceleration of NEDI provides strong versatility and extensibility. In this paper, we propose a fine-grained implementation for NEDI using GPU . In the fine-grained approach, we assign calculations of one unknown pixel to 2 × 2, 2 × 4, and 4 × 4 threads. Using NVIDIA TESLA K40C GPU for a case of asynchronous I/O transfer, our GPU optimization efforts using fine-grained NEDI can achieve a speedup of 99.09-fold when considering the I/O transfer time, compared with the original single-threaded C CPU code with the -O2 compiling optimization on Intel core™ i7-920. To demonstrate the effectiveness of our fine-grained scheme, we also compare the fine- and coarse-grained schemes by interpolating a 720p video to 1440p. Adopting the fine-grained scheme, we can achieve real-time display. The fine-grained parallel mode can be expanded to other algorithms based on regression and auto-regression schemes.