Abstract

Three-dimensional (3D) visual saliency is fundamental for vision-guided applications such as human–computer interaction in virtual reality, image quality assessment, object tracking, and event retrieval. Classical models for 3D visual saliency can draw an appropriate saliency map when the quality of the required depth maps or auxiliary cues is high enough. However, the depth map is usually impaired with artifacts (such as holes or noise) from faults in stereo matching or multipaths in range sensors. In these cases, challenges arise in those 3D visual saliency models because the core preliminary processes, such as the detection of low-level visual features, may fail. To solve this problem, we proposed a two-stage clustering-based 3D visual saliency model for human visual fixation prediction in dynamic scenarios. In this model, a two-stage clustering scheme is designed to handle the negative influence of impaired depth videos. With the help of this scheme, representative cues are selected for saliency modeling. After that, multimodal saliency maps are obtained from depth, color, and 3D motion cues. Finally, a cross-Bayesian model is designed for the pooling of multimodal saliency maps. The experimental results demonstrate that the proposed 3D saliency model based on two-stage clustering outperforms other state-of-the-art models on a variety of metrics. Furthermore, the consistency and robustness of our model are also verified.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call