Object matching can be viewed as an image patch matching problem, which is widely employed in image fusion, image retrieval and other computer vision fields. In this paper, we regard image matching as a regression and classification task and propose a visible and infrared image matching network named AMFFNet. AMFFNet adopts a Siamese network structure and utilizes a residual network with an attention mechanism to extract features from the input images to obtain feature maps, and fuses these feature maps. AMFFNet then performs classification and regression on the fused feature maps to achieve matching. The classification operation identifies whether the predicted area is the object, while the regression operation determines the coordinates and size of the predicted box. To improve the matching performance of the network, we utilize the center-ness branch in the network and use Generalized Intersection over Union (GIoU) loss during training. We rearrange the existing dataset to provide a sufficient visible–infrared image matching dataset. Experimental results demonstrate that the proposed method achieves superior matching mean Average Precision (mAP) compared to other methods.