Determining the attitude of a non-cooperative target in space is an important frontier issue in the aerospace field, and has important application value in the fields of malfunctioning satellite state assessment and non-cooperative target detection in space. This paper proposes a non-cooperative target attitude estimation method based on the deep learning of ground and space access (GSA) scene radar images to solve this problem. In GSA scenes, the observed target satellite can be imaged not only by inverse synthetic-aperture radar (ISAR), but also by space-based optical satellites, with space-based optical images providing more accurate attitude estimates for the target. The spatial orientation of the intersection of the orbital planes of the target and observation satellites can be changed by fine tuning the orbit of the observation satellite. The intersection of the orbital planes is controlled to ensure that it is collinear with the position vector of the target satellite when it is accessible to the radar. Thus, a series of GSA scenes are generated. In these GSA scenes, the high-precision attitude values of the target satellite can be estimated from the space-based optical images obtained by the observation satellite. Thus, the corresponding relationship between a series of ISAR images and the attitude estimation of the target at this moment can be obtained. Because the target attitude can be accurately estimated from the GSA scenes obtained by a space-based optical telescope, these attitude estimation values can be used as training datasets of ISAR images, and deep learning training can be performed on ISAR images of GSA scenes. This paper proposes an instantaneous attitude estimation method based on a deep network, which can achieve robust attitude estimation under different signal-to-noise ratio conditions. First, ISAR observation and imaging models were created, and the theoretical projection relationship from the three-dimensional point cloud to the ISAR imaging plane was constructed based on the radar line of sight. Under the premise that the ISAR imaging plane was fixed, the ISAR imaging results, theoretical projection map, and target attitude were in a one-to-one correspondence, which meant that the mapping relationship could be learned using a deep network. Specifically, in order to suppress noise interference, a UNet++ network with strong feature extraction ability was used to learn the mapping relationship between the ISAR imaging results and the theoretical projection map to achieve ISAR image enhancement. The shifted window (swin) transformer was then used to learn the mapping relationship between the enhanced ISAR images and target attitude to achieve instantaneous attitude estimation. Finally, the effectiveness of the proposed method was verified using electromagnetic simulation data, and it was found that the average attitude estimation error of the proposed method was less than 1°.