AbstractDirect volume rendering (DVR) is an important tool for scientific and medical imaging visualization. Modern GPU acceleration has made DVR more accessible; however, the production of high‐quality rendered images with high frame rates is computationally expensive. We propose a deep learning method with a reduced computational demand. We leveraged a conditional generative adversarial network (cGAN) to upsample DVR images (a rendered scene), with a reduced sampling rate to obtain similar visual quality to that of a fully sampled method. Our dvrGAN is combined with a colour‐based loss function that is optimized for DVR images where different structures such as skin, bone, etc. are distinguished by assigning them distinct colours. The loss function highlights the structural differences between images, by examining pixel‐level colour, and thus helps identify, for instance, small bones in the limbs that may not be evident with reduced sampling rates. We evaluated our method in DVR of human computed tomography (CT) and CT angiography (CTA) volumes. Our method retained image quality and reduced computation time when compared to fully sampled methods and outperformed existing state‐of‐the‐art upsampling methods.