Generative Augmentation-Driven Prediction of Diverse Visual Scanpaths in Images

Ashish Verma,Debashis Sen

doi:10.1109/tai.2023.3278650

Abstract

Visual scanpaths of multiple humans on an image represent the process by which they capture the information in it. State-of-the-art models to predict visual scanpaths on images learn directly from recorded human visual scanpaths. However, the generation of multiple visual scanpaths on an image having diversity like human visual scanpaths has not been explicitly considered. In this paper, we propose a deep network for predicting multiple diverse visual scanpaths on an image. Image-specific hidden Markov model based generative data augmentation is performed in the beginning to increase the number of image-visual scanpath training pairs. Considering a similarity between our generative data augmentation process and the use of long short-term memory (LSTM) for prediction, we propose an LSTM based visual scanpath predictor. A network to predict a single visual scanpath on an image is designed first. The network is then modified to predict multiple diverse visual scanpaths representing different viewer varieties by using a parameter indicating the uniqueness of a viewer. A random vector is also employed for subtle variations within scanpaths of the same viewer variety. Our models are evaluated on three standard datasets using multiple performance measures, which demonstrate the superiority of the proposed approach over the state-of-the-art. Empirical studies are also given indicating the significance of our generative data augmentation method and our multiple scanpath prediction strategy producing diverse visual scanpaths. Code link: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://ashishverma03.github.io/Diverse-Visual-Scanpath</uri>

Full Text