Depth image-based rendering (DIBR) is an important technology in the process of 2D-to-3D conversion. It uses texture images and related depth maps to render virtual views. While there are still some challenging problems in the current DIBR systems, such as disocclusion occurrences. Inpainting methods based on deep learning have recently shown significant improvements and generated plausible images. However, most of these methods may not deal well with the disocclusion holes in the synthesized views, because on the one hand they only treat this issue as generative inpainting after 3D warping, rather than following the full DIBR processing procedures. While on the other hand the distributions of holes on the virtual views are always around the transition regions of foreground and background, which makes them more difficult to distinguish without special constraints. Motivated by these observations, this paper proposes a novel learning-based method for stereoscopic view synthesis, in which the disocclusion regions are restored by a progressive structure reconstruction strategy instead of direct texture inpainting. Additionally, some special cues in the synthesized scenes are further exploited as constraints for the network to alleviate hallucinated structure mixtures among different layers. Extensive empirical evaluations and comparisons validate the strengths of the proposed approach and demonstrate that the model is more suitable for stereoscopic synthesis in the 2D-to-3D conversion applications.