ArUco/Gaze Tracking in Real Environments

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The emergence of affordable mobile eye-trackers has allowed to study gaze behavior in real-world environments. However, the gaze mapping from recorded video to a static reference image is a complex and open problem. Finding a reference image within the video frames, i.e., image matching, can give satisfying results, but occluded or overlapped objects are almost impossible to locate using this technique. We suggest using ArUco fiducial markers (and their associated software library available in OpenCV) to map gaze to dynamic Areas Of Interest (AOIs) within a reference image. Although such markers have been used previously, technical details of marker detection and mapping have been sparse. The current approach consists of three steps: (1) define an AOI using markers, then (2) resolve any conflict among overlapping AOIs, and (3) map the gaze point to the reference image. A dynamic AOI can be defined using one or more corner markers. When camera rotations are limited and the object is relatively orthogonal to the camera, it is possible to define an AOI using only one corner marker. When the camera rotates, its pose estimation is required to project corner points to the camera image plane. An AOI can also be defined with four corner markers, which has the advantage of robustness with respect to camera rotations, and no a priori required knowledge of the physical dimensions of the object. The two approaches can be combined, e.g., when using four corner markers and one of the markers is most (due to occlusion or view angle), the basis vectors can be used to interpolate the position of the lost marker. When two or more AOIs overlap and all the markers are tracked, gaze should be marked on the AOI closer to the camera. The distance to an object can be defined knowing the length of the object, the number of pixels spanned on the image and the pre-computed camera focal parameter. Once the AOIs are defined and marker overlaps are resolved, the gaze point can be mapped to the coordinates of the reference image using homography.

Similar Papers
  • Research Article
  • Cite Count Icon 11
  • 10.4018/ijmdem.2021010104
Automated Filtering of Eye Movements Using Dynamic AOI in Multiple Granularity Levels
  • Jan 1, 2021
  • International Journal of Multimedia Data Engineering and Management
  • Gavindya Jayawardena + 1 more

Eye-tracking experiments involve areas of interest (AOIs) for the analysis of eye gaze data. While there are tools to delineate AOIs to extract eye movement data, they may require users to manually draw boundaries of AOIs on eye tracking stimuli or use markers to define AOIs. This paper introduces two novel techniques to dynamically filter eye movement data from AOIs for the analysis of eye metrics from multiple levels of granularity. The authors incorporate pre-trained object detectors and object instance segmentation models for offline detection of dynamic AOIs in video streams. This research presents the implementation and evaluation of object detectors and object instance segmentation models to find the best model to be integrated in a real-time eye movement analysis pipeline. The authors filter gaze data that falls within the polygonal boundaries of detected dynamic AOIs and apply object detector to find bounding-boxes in a public dataset. The results indicate that the dynamic AOIs generated by object detectors capture 60% of eye movements & object instance segmentation models capture 30% of eye movements.

  • Conference Article
  • Cite Count Icon 1
  • 10.5244/c.10.11
Automatically Locating an Area of Interest and Maintaining a Reference Image to Aid the Real-Time Tracking of Objects
  • Jan 1, 1996
  • M.T Cornish + 1 more

Real-time tracking systems often make use of a reference image and, where processing power is limited, an 'area of interest'. To obtain such information commonly requires user interaction. Typically, a reference image is obtained by the user capturing an image when the objects being tracked are not present in the scene and an 'area of interest' may also be defined by the user; both being specific to a particular scene. An alternative approach that would automatically generate and maintainn an up-to-date reference image is investigated in this paper. This consists of, essentially, 'cutting and pasting' areas of image from a sequence of frames to obtain an image containing no moving objects. Furthermore, a method for automatically generating an 'area of interest' is described. This method identifies areas of movement in a sequence of frames in order to build the 'area of interest'. These techniques have been successfully developed and proven using video sequences of more than one traffic roundabout.

  • Research Article
  • Cite Count Icon 35
  • 10.1007/s00464-018-6513-5
Eye tracking in surgical education: gaze-based dynamic area of interest can discriminate adverse events and expertise.
  • Oct 19, 2018
  • Surgical Endoscopy
  • Eric Fichtel + 6 more

Eye-gaze metrics derived from areas of interest (AOIs) have been suggested to be effective for surgical skill assessment. However, prior research is mostly based on static images and simulated tasks that may not translate to complex and dynamic surgical scenes. Therefore, eye-gaze metrics must advance to account for changes in the location of important information during a surgical procedure. We developed a dynamic AOI generation technique based on eye gaze collected from an expert viewing surgery videos. This AOI updated as the gaze of the expert moved with changes in the surgical scene. This technique was evaluated through an experiment recruiting a total of 20 attendings and residents to view 10 videos associated with and another 10 without adverse events. Dwell time percentage (i.e., gaze duration) inside the AOI differentiated video type (U = 13508.5, p < 0.001) between videos with the presence (Mdn = 16.75) versus absence (Mdn = 19.95) of adverse events. This metric also differentiated participant group (U = 14029.5, p < 0.001) between attendings (Mdn = 15.45) and residents (Mdn = 19.80). This indicates that our dynamic AOIs reflecting the expert eye gaze was able to differentiate expertise, and the presence of unexpected adverse events. This dynamic AOI generation technique produced dynamic AOIs for deriving eye-gaze metrics that were sensitive to expertise level and event characteristics.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/iri49571.2020.00018
Automated Filtering of Eye Gaze Metrics from Dynamic Areas of Interest
  • Aug 1, 2020
  • Gavindya Jayawardena + 1 more

Eye-tracking experiments usually involves areas of interests (AOIs) for the analysis of eye gaze data as they could reveal potential cognitive load, and attentional patterns yielding interesting results about participants. While there are tools to define AOIs to extract eye movement data for the analysis of gaze measurements, they may require users to draw boundaries of AOIs on eye tracking stimuli manually or use markers to define AOIs in the space to generate AOI-mapped gaze locations. In this paper, we introduce a novel method to dynamically filter eye movement data from AOIs for the analysis of advanced eye gaze metrics. We incorporate pre-trained object detectors for offline detection of dynamic AOIs in dynamic eye-tracking stimuli such as video streams. We present our implementation and evaluation of object detectors to find the best object detector to be integrated in a real-time eye movement analysis pipeline to filter eye movement data that falls within the polygonal boundaries of detected dynamic AOIs. Our results indicate the utility of our method by applying it to a publicly available dataset.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-030-00114-8_55
How Will Humans Cut Through Automated Vehicle Platoons in Mixed Traffic Environments? A Simulation Study of Drivers’ Gaze Behaviors Based on the Dynamic Areas of Interest
  • Jan 1, 2019
  • Xiang Guo + 5 more

With higher levels of automation (LoA) in vehicles, mixed traffic environments are expected to emerge and last, until the transition into fully autonomous traffic environments. So far, there is limited research surrounding this transitional state of mixed traffic environments regarding how humans in semiautomatic vehicles interact with autonomous vehicles on the road. Knowledge behind the interactive behavior of humans in this situation can be useful to resolve the uncertainty in mixed traffic environments. This study investigated the manual driver’s gaze behaviors during their targeted action of changing lanes and cutting through a platoon of fully automated vehicles, with the goal of safely exiting a highway. This scenario of manually cutting through a platoon, which was run by Cooperative Adaptive Cruise Control (CACC), was developed using a model-based simulation software (PreScan, TASS International) and tested in a driving simulator with a controlled experimental scheme. The resultant gaze behaviors, captured during the experiments using the eye-tracking glass (ETG2, SMI), were analyzed by mapping the gaze vectors with the specified area of interests (AOIs) in the visual field of view. This paper focuses on applying a deep learning algorithm for automated detection and tracking of the two, dynamic AOIs, (1) the leading vehicle from the platoon in the middle lane pertaining to the perceived distance and amount of time the driver has to make a lane change and (2) the road center outlined by the detection of lane and road boundaries, which serves as a frame of reference for visual attention in a primary driving task. By training over a thousand images for deep learning, the accuracy of the two AOIs’ detection was obtained at 99.85% and 71.95%, respectively. The mapping of gaze vectors with the AOIs showed that, with a shorter time headway (THW) of a platoon, drivers spent a longer time fixating on the leading vehicle from the platoon and that the average fixation time on the road center became longer, while the percentage of road center remains unchanged. These findings imply there are significant changes in cognitive workload with various platoon time headway during a mixed traffic environment.

  • Research Article
  • Cite Count Icon 3
  • 10.1177/21695067231192929
Shifting Perspectives: A proposed framework for analyzing head-mounted eye-tracking data with dynamic areas of interest and dynamic scenes.
  • Sep 1, 2023
  • Proceedings of the Human Factors and Ergonomics Society ... Annual Meeting. Human Factors and Ergonomics Society. Annual meeting
  • Haroula M Tzamaras + 3 more

Eye-tracking is a valuable research method for understanding human cognition and is readily employed in human factors research, including human factors in healthcare. While wearable mobile eye trackers have become more readily available, there are no existing analysis methods for accurately and efficiently mapping dynamic gaze data on dynamic areas of interest (AOIs), which limits their utility in human factors research. The purpose of this paper was to outline a proposed framework for automating the analysis of dynamic areas of interest by integrating computer vision and machine learning (CVML). The framework is then tested using a use-case of a Central Venous Catheterization trainer with six dynamic AOIs. While the results of the validity trial indicate there is room for improvement in the CVML method proposed, the framework provides direction and guidance for human factors researchers using dynamic AOIs.

  • Conference Article
  • Cite Count Icon 52
  • 10.1109/cvpr.2019.00578
Visual Localization by Learning Objects-Of-Interest Dense Match Regression
  • Jun 1, 2019
  • Philippe Weinzaepfel + 3 more

We introduce a novel CNN-based approach for visual localization from a single RGB image that relies on densely matching a set of Objects-of-Interest (OOIs). In this paper, we focus on planar objects which are highly descriptive in an environment, such as paintings in museums or logos and storefronts in malls or airports. For each OOI, we define a reference image for which 3D world coordinates are available. Given a query image, our CNN model detects the OOIs, segments them and finds a dense set of 2D-2D matches between each detected OOI and its corresponding reference image. Given these 2D-2D matches, together with the 3D world coordinates of each reference image, we obtain a set of 2D-3D matches from which solving a Perspective-n-Point problem gives a pose estimate. We show that 2D-3D matches for reference images, as well as OOI annotations can be obtained for all training images from a single instance annotation per OOI by leveraging Structure-from-Motion reconstruction. We introduce a novel synthetic dataset, VirtualGallery, which targets challenges such as varying lighting conditions and different occlusion levels. Our results show that our method achieves high precision and is robust to these challenges. We also experiment using the Baidu localization dataset captured in a shopping mall. Our approach is the first deep regression-based method to scale to such a larger environment.

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.exger.2021.111342
Dual-tasking impacts gait, cognitive performance, and gaze behavior during walking in a real-world environment in older adult fallers and non-fallers.
  • Apr 7, 2021
  • Experimental Gerontology
  • Lisa A Zukowski + 4 more

Dual-tasking impacts gait, cognitive performance, and gaze behavior during walking in a real-world environment in older adult fallers and non-fallers.

  • Abstract
  • 10.1016/j.jsxm.2022.03.543
The role of visual attention toward sexual attributes in men and women's sexual arousal and thoughts
  • May 1, 2022
  • The Journal of Sexual Medicine
  • J Carvalho + 1 more

The role of visual attention toward sexual attributes in men and women's sexual arousal and thoughts

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.optlaseng.2019.105894
Strain determination based on strain gauge-guided radial basis function and digital image correlation
  • Oct 11, 2019
  • Optics and Lasers in Engineering
  • Xiangjun Dai + 9 more

Strain determination based on strain gauge-guided radial basis function and digital image correlation

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.cviu.2015.03.005
Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos
  • May 24, 2015
  • Computer Vision and Image Understanding
  • Meltem Demirkus + 3 more

Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos

  • Conference Article
  • Cite Count Icon 126
  • 10.1109/cvpr46437.2021.00364
VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
  • Jun 1, 2021
  • Sijie Zhu + 2 more

Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view. Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets. However, these results rely on the assumption that there exists a reference image exactly centered at the location of any query image, which is not applicable for practical scenarios. In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge. This assumption breaks the one-to-one retrieval setting of existing datasets as the queries and reference images are not perfectly aligned pairs, and there may be multiple reference images covering one query location. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark –VIGOR– for cross-View Image Geo-localization beyond One-to-one Retrieval. We benchmark existing state-of-the-art methods and propose a novel end-to-end framework to localize the query in a coarse-to-fine manner. Apart from the image-level retrieval accuracy, we also evaluate the localization accuracy in terms of the actual distance (meters) using the raw GPS data. Extensive experiments are conducted under different application scenarios to validate the effectiveness of the proposed method. The results indicate that cross-view geo-localization in this realistic setting is still challenging, fostering new research in this direction. Our dataset and code will be released at https://github.com/JeffZilence/VIGOR.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/embc.2014.6944433
Tracking gaze while walking on a treadmill: spatial accuracy and limits of use of a stationary remote eye-tracker.
  • Aug 1, 2014
  • Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
  • V Serchi + 3 more

Inaccurate visual sampling and foot placement may lead to unsafe walking. Virtual environments, challenging obstacle negotiation, may be used to investigate the relationship between the point of gaze and stepping accuracy. A measurement of the point of gaze during walking can be obtained using a remote eye-tracker. The assessment of its performance and limits of applicability is essential to define the areas of interest in a virtual environment and to collect information for the analysis of the visual strategy. The current study aims at characterizing a gaze eye-tracker in static and dynamic conditions. Three different conditions were analyzed: a) looking at a single stimulus during selected head movements b) looking at multiple stimuli distributed on the screen from different distances, c) looking at multiple stimuli distributed on the screen while walking. The eye-tracker was able to measure the point of gaze during the head motion along medio-lateral and vertical directions consistently with the device specifications, while the tracking during the head motion along the anterior-posterior direction resulted to be lower than the device specifications. During head rotation around the vertical direction, the error of the point of gaze was lower than 23 mm. The best accuracy (10 mm) was achieved, consistently to the device specifications, in the static condition performed at 650 mm from the eye-tracker, while point of gaze data were lost while getting closer to the eye-tracker. In general, the accuracy and precision of the point of gaze did not show to be related to the stimulus position. During fast walking (1.1 m/s), the eye-tracker did not lose any data, since the head range of motion was always within the ranges of trackability. The values of accuracy and precision during walking were similar to those resulting from static conditions. These values will be considered in the definition of the size and shape of the areas of interest in the virtual environment.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.patrec.2020.04.003
Joint image restoration and matching method based on distance-weighted sparse representation prior
  • Apr 26, 2020
  • Pattern Recognition Letters
  • Yuanjie Shao + 4 more

Joint image restoration and matching method based on distance-weighted sparse representation prior

  • Conference Article
  • 10.1109/niles50944.2020.9257980
Robust Target Detection in Optical Scene Based on Multiple Reference Images
  • Oct 24, 2020
  • Mohamed M Kamel + 3 more

Target detection has a wide spectrum of promising applications in image processing. Several image matching techniques using features descriptors and detectors that can be used for target detection were introduced in the literature. These techniques achieve the detection task accurately in case of capturing the reference and scene images from the same sensor. On the other hand, the performance of these matching techniques is degraded if the scene and reference images were captured from different sensors because of different image transformations and deformations problems that occur.This paper introduces a robust technique that enhances the performance of the target detection. We argue that, the proposed technique differs significantly from many recent target detection techniques, as it is based mainly on a voting process that select the best matches between the reference images and the scene image. The proposed technique emphasizes the features of objects in multiple reference images with different perspective angles for enhancing the matching task. Experimental results with real images are used to illustrate the efficiency of this approach. The accuracy percentage for the proposed technique is 48.4615 %. The performance of the proposed technique outperforms the recent techniques and increases the resilience of the image matching task against different image transformations and deformations problems. Finally, the performance analysis is accomplished using three metrics: number of matches, execution time, and accuracy.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon