Most existing underwater image enhancement methods only focus on enhancing a single image. However, underwater images taken in the same scene often exhibit similar degradation characteristics, which can provide richer complementary information to each other. In this paper, a novel underwater image co-enhancement based on physical-guided Transformer interaction (UICE-PTI), which adopts a multi-scale encoder–decoder structure to effective mine the rich semantic information, is proposed. Specifically, considering that the degradation of underwater image is directly related to scene depth, the Dark Channel Prior-guided Transformer (DCPT) module is embedded into the framework before the preliminary feature extraction. Then, the convolution operation in the preliminary feature extraction is proposed for the local degradation of the underwater image. After that, considering the non-local and heterogeneous degradation of the underwater images across different channels and pixels, the CS-Transformer block with second-order statistics is proposed, which incorporates both channel and spatial Transformer modules. Furthermore, considering the rich complementary information between images of the same scene for enhancement, the Feature Transformer Interaction Module (FTIM) is proposed to capture the correlation between two branches in the network bottleneck layer. Additionally, the proposed UICE-PTI can also be extended to underwater stereo image enhancement. Finally, the experimental results demonstrate the superior performance of the proposed UICE-PTI and the effectiveness of each module.