At present, the speed of urbanization in China is constantly accelerating. At the same time, due to the severe situation of tight resource constraints, severe environmental pollution, and ecosystem degradation, vigorously promoting the construction of ecological civilization has become a key planning direction. However, traditional urban and rural ecological spatial planning is influenced by factors such as region, terrain, and spatial scale, which cannot adapt to the current spatial planning requirements. To achieve sustainable urban and rural ecological spatial planning, we propose a method that uses the optimized remote sensing images and convolutional neural networks to achieve spatial planning. In the analysis of the application effect of the usage method, the experimental results show that increasing the amount of data such as image size can improve the execution performance of the computer when the computer is not fully utilizing its resources and its computational volume fails to saturate the computational capacity. The parallel configuration designed in this experiment can accelerate the performance of the computer better, and the acceleration effect becomes more obvious as the difficulty of the algorithm increases. The Faster RCNN algorithm proposed in this experiment has the highest retrieval accuracy in the Flickr30K dataset and MS-COCO dataset compared with other algorithms. In Flickr30k data set, compared with other models in the table, the model used in this paper has the highest retrieval accuracy. The retrieval accuracy of R@1, R@5, R@10 increased by 23.1%, 8.1% and 5.3%, respectively. In MS-COCO data set, the retrieval accuracy increased by 19.2%, 13.1% and 8.3% respectively. The above results confirm that the combination of remote sensing images and convolutional neural network technology can perform simple ecological planning of a city's urban and rural areas, which proves that the method proposed in this experiment has practicality.