Graphics processing units (GPUs) have been a popular choice for the acceleration of iterative tomographic reconstruction algorithms. In this work, two dual-GPU approaches to iterative list-mode reconstruction of PET data are studied. The developed techniques split the image into equal-size cubes and distribute them between the GPUs. Benefiting from caching a plane of the image into the fast shared memory, line-driven forward and back projection kernels are implemented incorporating a shift-invariant point spread function (PSF). In the first approach, lines were individually convolved with the PSF kernel whereas the second approach convolves the reconstructed image with the PSF before the forward projection and after the back projection. The algorithms were implemented using two CUDA-compatible devices with no CPU/GPU data transfer during the reconstruction process. List-mode data obtained from Monte Carlo simulations were used to evaluate the methods. One complete iteration of 107 events on an image of 220 × 220 × 110 voxels is done in less than 510 ms for a non-PSF reconstruction and in 520 to 5220 ms for a PSF reconstruction depending on the PSF and the reconstruction parameters. Dual-GPU memory layout, implementation details, sample images, and performance metrics of the developed algorithms are discussed.