Traditional dental prosthetics require a significant amount of work, labor, and time. To simplify the process, a method to convert teeth scan images, scanned using an intraoral scanner, into 3D images for design was developed. Furthermore, several studies have used deep learning to automate dental prosthetic processes. Tooth images are required to train deep learning models, but they are difficult to use in research because they contain personal patient information. Therefore, we propose a method for generating virtual tooth images using image-to-image translation (pix2pix) and contextual reconstruction fill (CR-Fill). Various virtual images can be generated using pix2pix, and the images are used as training images for CR-Fill to compare the real image with the virtual image to ensure that the teeth are well-shaped and meaningful. The experimental results demonstrate that the images generated by the proposed method are similar to actual images. In addition, only using virtual images as training data did not perform well; however, using both real and virtual images as training data yielded nearly identical results to using only real images as training data.