High-resolution remote sensing images have the characteristics of complex background environment, clustering of objects, etc., the complex background makes the remote sensing image contain a large number of irrelevant ground objects with a high similarity or overlap, which makes the edge and texture of the objects not clear enough, and this leads to low recognition accuracy of ground objects such as airports, dams, and golf field, although the size of this object is large. Based on this problem, this paper proposes a remote sensing image object detection method based on the YOLOv5 network. By improving the backbone extraction network, the network structure can be deepened to get more information about large objects, and the detection effect can be improved by adding an attention mechanism and adding an output layer to enhance feature extraction and feature fusion. The pre-training weight is obtained by transfer learning and used as the training weight of the improved YOLOv5 to speed up the network convergence. The experiment is carried out on the DIOR dataset, the results show that the improved YOLOv5 network can significantly improve the accuracy of large object recognition compared with the YOLO series network and the EfficientDet model on DIOR dataset, and the mAP of the improved YOLOv5 network is 80.5%, which is 2% higher than the original YOLOv5 network.