Although deep learning techniques have achieved noticeable success in aircraft detection, the scale heterogeneity, position difference, complex background interference, and speckle noise keep aircraft detection in large-scale synthetic aperture radar (SAR) images challenging. To solve these problems, we propose the geospatial transformer framework and implement it as a three-step target detection neural network, namely, the image decomposition, the multiscale geospatial contextual attention network (MGCAN), and result recomposition. First, the given large-scale SAR image is decomposed into slices via sliding windows according to the image characteristics of the aircraft. Second, slices are input into the MGCAN network for feature extraction, and the cluster distance nonmaximum suppression (CD-NMS) is utilized to determine the bounding boxes of aircraft. Finally, the detection results are produced via recomposition. Two innovative geospatial attention modules are proposed within MGCAN, namely, the efficient pyramid convolution attention fusion (EPCAF) module and the parallel residual spatial attention (PRSA) module, to extract multiscale features of the aircraft and suppress background noise. In the experiment, four large-scale SAR images with 1-m resolution from the Gaofen-3 system are tested, which are not included in the dataset. The results indicate that the detection performance of our geospatial transformer is better than Faster R-CNN, SSD, Efficientdet-D0, and YOLOV5s. The geospatial transformer integrates deep learning with SAR target characteristics to fully capture the multiscale contextual information and geospatial information of aircraft, effectively reduces complex background interference, and tackles the position difference of targets. It greatly improves the detection performance of aircraft and offers an effective approach to merge SAR domain knowledge with deep learning techniques.