Visual saliency prediction plays a crucial role in Unmanned Aerial Vehicle (UAV) video analysis tasks. In this paper, an eye-tracking dataset of the immersive viewing of videos captured from a First-Person View (FPV) of UAVs is developed, which consists of 200 video clips captured by DJI FPV drones, with a resolution of 4K QHD. The videos cover six different genres and fourteen unique scenes. To study human visual attention in watching FPV videos, fixation points are recorded using an eye tracker integrated into a VR headset. Based on the dataset, a simple yet effective FPV UAV video Saliency prediction model (FUAVSal) is proposed as a baseline, considering spatial–temporal feature, camera motion information and FPV prior. To establish benchmarks for saliency prediction in immersive FPV UAV video viewing, sixteen computational models are evaluated on this dataset. Detailed quantitative and qualitative comparisons are provided. The developed dataset and benchmarks aim to facilitate research on visual saliency prediction for First-Person View UAV videos.
Read full abstract