Visual Attention Prediction Research Articles

Humans race drones faster than neural networks trained for end-to-end autonomous flight. This may be related to the ability of human pilots to select task-relevant visual information effectively. This work investigates whether neural networks capable of imitating human eye gaze behavior and attention can improve neural networks' performance for the challenging task of vision-based autonomous drone racing. We hypothesize that gaze-based attention prediction can be an efficient mechanism for visual information selection and decision making in a simulator-based drone racing task. We test this hypothesis using eye gaze and flight trajectory data from 18 human drone pilots to train a visual attention prediction model. We then use this visual attention prediction model to train an end-to-end controller for vision-based autonomous drone racing using imitation learning. We compare the drone racing performance of the attention-prediction controller to those using raw image inputs and image-based abstractions (i.e., feature tracks). Comparing success rates for completing a challenging race track by autonomous flight, our results show that the attention-prediction based controller (88% success rate) outperforms the RGB-image (61% success rate) and feature-tracks (55% success rate) controller baselines. Furthermore, visual attention-prediction and feature-track based models showed better generalization performance than image-based models when evaluated on hold-out reference trajectories. Our results demonstrate that human visual attention prediction improves the performance of autonomous vision-based drone racing agents and provides an essential step towards vision-based, fast, and agile autonomous flight that eventually can reach and even exceed human performances.

Read full abstract

In this paper, we present a novel end-to-end learning neural network, i.e., MATNet, for zero-shot video object segmentation (ZVOS). Motivated by the human visual attention behavior, MATNet leverages motion cues as a bottom-up signal to guide the perception of object appearance. To achieve this, an asymmetric attention block, named Motion-Attentive Transition (MAT), is proposed within a two-stream encoder network to firstly identify moving regions and then attend appearance learning to capture the full extent of objects. Putting MATs in different convolutional layers, our encoder becomes deeply interleaved, allowing for close hierarchical interactions between object apperance and motion. Such a biologically-inspired design is proven to be superb to conventional two-stream structures, which treat motion and appearance independently in separate streams and often suffer severe overfitting to object appearance. Moreover, we introduce a bridge network to modulate multi-scale spatiotemporal features into more compact, discriminative and scale-sensitive representations, which are subsequently fed into a boundary-aware decoder network to produce accurate segmentation with crisp boundaries. We perform extensive quantitative and qualitative experiments on four challenging public benchmarks, i.e., DAVIS16, DAVIS17, FBMS and YouTube-Objects. Results show that our method achieves compelling performance against current state-of-the-art ZVOS methods. To further demonstrate the generalization ability of our spatiotemporal learning framework, we extend MATNet to another relevant task: dynamic visual attention prediction (DVAP). The experiments on two popular datasets (i.e., Hollywood-2 and UCF-Sports) further verify the superiority of our model. Our implementations have been made publicly available at https://github.com/tfzhou/MATNet.

Read full abstract

Visual Attention Prediction Research Articles

Related Topics

Articles published on Visual Attention Prediction

How is Visual Attention Influenced by Text Guidance? Database and Model.

Prediction of Driver's Visual Attention in Critical Moment Using Optical Flow

Continuous Prediction of Web User Visual Attention on Short Span Windows Based on Gaze Data Analytics.

Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction

AdaFI-FCN: an adaptive feature integration fully convolutional network for predicting driver’s visual attention

SST-Sal: A Spherical Spatio-Temporal Approach for Saliency Prediction in 360º Videos

Domain classifier-based transfer learning for visual attention prediction

Visual attention prediction improves performance of autonomous drone racing agents.

SalyPath360: Saliency and scanpath prediction framework for omnidirectional images

Deep Convolutional Symmetric Encoder—Decoder Neural Networks to Predict Students’ Visual Attention

Prediction of visual attention in embodied real-world tasks

Predicting user visual attention in virtual reality with a deep learning model

HammerDrive: A Task-Aware Driving Visual Attention Model

Visual attention prediction for Autism Spectrum Disorder with hierarchical semantic fusion

Investigating Visual Attention-based Traffic Accident Detection Model

Spatio-temporal visual attention modelling of standard biometry plane-finding navigation.

High-Resolution Neural Network for Driver Visual Attention Prediction.

Paying Attention to Video Object Pattern Understanding.

A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.

MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Visual Attention Prediction Research Articles

Related Topics

Articles published on Visual Attention Prediction

How is Visual Attention Influenced by Text Guidance? Database and Model.

Prediction of Driver's Visual Attention in Critical Moment Using Optical Flow

Continuous Prediction of Web User Visual Attention on Short Span Windows Based on Gaze Data Analytics.

Image caption generation using Visual Attention Prediction and Contextual Spatial Relation Extraction

AdaFI-FCN: an adaptive feature integration fully convolutional network for predicting driver’s visual attention

SST-Sal: A Spherical Spatio-Temporal Approach for Saliency Prediction in 360º Videos

Domain classifier-based transfer learning for visual attention prediction

Visual attention prediction improves performance of autonomous drone racing agents.

SalyPath360: Saliency and scanpath prediction framework for omnidirectional images

Deep Convolutional Symmetric Encoder—Decoder Neural Networks to Predict Students’ Visual Attention

Prediction of visual attention in embodied real-world tasks

Predicting user visual attention in virtual reality with a deep learning model

HammerDrive: A Task-Aware Driving Visual Attention Model

Visual attention prediction for Autism Spectrum Disorder with hierarchical semantic fusion

Investigating Visual Attention-based Traffic Accident Detection Model

Spatio-temporal visual attention modelling of standard biometry plane-finding navigation.

High-Resolution Neural Network for Driver Visual Attention Prediction.

Paying Attention to Video Object Pattern Understanding.

A Multimodal Saliency Model for Videos with High Audio-Visual Correspondence.

MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation.