AbstractDirect volume rendering (DVR) is a commonly utilized technique for three-dimensional visualization of volumetric medical images. A key goal of DVR is to enable users to visually emphasize regions of interest (ROIs) which may be occluded by other structures. Conventional methods for ROIs visual emphasis require extensive user involvement for the adjustment of rendering parameters to reduce the occlusion, dependent on the user’s viewing direction. Several works have been proposed to automatically preserve the view of the ROIs by eliminating the occluding structures of lower importance in a view-dependent manner. However, they require pre-segmentation labeling and manual importance assignment on the images. An alternative to ROIs segmentation is to use ‘saliency’ to identify important regions. This however lacks semantic information and thus leads to the inclusion of false positive regions. In this study, we propose an attention-driven visual emphasis method for volumetric medical image visualization. We developed a deep learning attention model, termed as focused-class attention map (F-CAM), trained with only image-wise labels for automated ROIs localization and importance estimation. Our F-CAM transfers the semantic information from the classification task for use in the localization of ROIs, with a focus on small ROIs that characterize medical images. Additionally, we propose an attention compositing module that integrates the generated attention map with transfer function within the DVR pipeline to automate the view-dependent visual emphasis of the ROIs. We demonstrate the superiority of our method compared to existing methods on a multi-modality PET-CT dataset and an MRI dataset.