The generation of depth images of occlusal dental crowns is complicated by the need for customization in each case. To decrease the workload of skilled dental technicians, various computer vision models have been used to generate realistic occlusal crown depth images with definite crown surface structures that can ultimately be reconstructed to three-dimensional crowns and directly used in patient treatment. However, it has remained difficult to generate images of the structure of dental crowns in a fluid position using computer vision models. In this paper, we propose a two-stage model for generating depth images of occlusal crowns in diverse positions. The model is divided into two parts: segmentation and inpainting to obtain both shape and surface structure accuracy. The segmentation network focuses on the position and size of the crowns, which allows the model to adapt to diverse targets. The inpainting network based on a GAN generates curved structures of the crown surfaces based on the target jaw image and a binary mask made by the segmentation network. The performance of the model is evaluated via quantitative metrics for the area detection and pixel-value metrics. Compared to the baseline model, the proposed method reduced the MSE score from 0.007001 to 0.002618 and increased DICE score from 0.9333 to 0.9648. It indicates that the model showed better performance in terms of the binary mask from the addition of the segmentation network and the internal structure through the use of inpainting networks. Also, the results demonstrated an improved ability of the proposed model to restore realistic details compared to other models.