Cross-view image synthesis aims to synthesize a ground-view image covering the same geographic region for a given single aerial-view image (or vice versa). Existing approaches typically tackle this challenging task by relaxing the single-image constraint and using a ground-truth semantic map as additional input to aid synthesis. However, this is nearly infeasible in practice. In this paper, we investigate how to generate a detail-enriched and structurally accurate ground-level image from only a single aerial-level input image, in which there are no other prior knowledge except for the input image. Towards this goal, we propose a novel Progressive Parallel Generative Adversarial Network (PPGAN) that starts from generating low-resolution outputs and progressively produces ground images at higher resolutions as the network propagates forward. In this manner, our PPGAN decomposes the task into several manageable sub-tasks, which helps to generate detail-enriched and structurally accurate ground images. During progressive generation, the PPGAN employs a parallel generation paradigm that enables the generator to produce multi-resolution images in parallel, thereby avoiding excessive time cost on training. Furthermore, for effective information propagation across multi-resolution images, a feature fusion module (FFM) is devised to mitigate the domain gap between cross-level image features, which enables a balance of detail and structural information synthesis. Additionally, the proposed Channel-Space Attention Selection Module (CSASM) learns the mapping relationship between cross-view images in a larger scale space to enhance the quality of the output image. Quantitative and qualitative experiments demonstrate that, our method requires only one input image without the aid of additional inputs, but is capable of synthesizing detail-enriched and structurally accurate ground images and outperforms the existing state-of-the-art methods on two famous benchmarks.