Abstract Parallel-phase processing enables rapid phase extraction from off-axis digital holograms. To achieve fast and accurate results, the phase reconstruction processes were parallelized using improved filter algorithms and optimized programming strategies. First, an adaptive filtering method based on the Chan-Vese (CV) model which better suits parallelism was designed to extract the +1 term spectrum. We selected suitable computer unified device architecture (CUDA) libraries according to the characteristics of the key phase reconstruction steps. Acceleration technologies, such as virtual memory and shared memory, were used to improve the computational efficiency. Furthermore, we combined an improved 4f optical imaging system with an embedded graphic processing unit (GPU) platform to design a low-cost phase reconstruction system for off-axis digital holography. To verify the feasibility of our method, the reconstructed quality of the CV filtering method was estimated, and the run times of phase retrieval on the central processing unit (CPU) and embedded GPU were compared for off-axis holograms with different pixel sizes. Additionally, the dynamic fluctuation phase maps of water droplet evaporation were retrieved to demonstrate the real-time capability of the method.