AbstractIn the field of underwater target detection, the passive sonar is an important means of long‐distance target detection. The sonar detection information typically includes both surface and underwater targets, whereas it is a great challenge on effectively distinguishing between surface and underwater targets solely based on sonar information. Effective fusion of sonar and AIS (Automatic Identification System) data can leverage their complementary nature to compensate for the limitation of sonar information. However, the sonar information and AIS information are acquired based on different detection principles and systems, which are essentially multi‐source heterogeneous information with obvious spatio‐temporal misalignment in nature. Existing fusion methods normally struggle to effectively align sonar and AIS data in both time and space subject to the complexity of the problem. In this study, the Dynamic Time Warping (DTW) algorithm is applied to align sonar and AIS data in the time domain. In addition, a deep learning algorithm with multi‐head attention mechanism is proposed to achieve the spatial alignment of sonar and AIS data, where the matching between the surface targets in AIS data and the same surface targets in sonar data can also be successfully achieved. It provides a priori knowledge to enhance the underwater target detection of the passive sonar by eliminating the interference of the surface targets. Based on the attention mechanism, the abstract features extracted from the intermediate‐layer of the neural networks are found to be effective to represent the typical features of the target motion trajectories, which also demonstrates the effectiveness of the attention mechanism. The experiment results show that the proposed method can successfully achieve a MatchingSucccessRate of over 95% between the AIS targets and sonar detection targets.