Abstract

Pupil detection is an indispensable part of the process of eye-tracking. Due of the limitation of existing methods on pupil image quality, we propose a pupil detection method using vision transformer with a hybrid structure. We first extract the local features of the image with CNN, and then obtain the global dependence through the encoder of the transformer, to excavate more accurate information on pupil position. We trained and tested the proposed model on 10 600 images from three publicly available datasets and compared with other pupil detection models. The analysis of the outcomes demonstrated that the hybrid vision transformer was superior to these comparison approaches in terms of accuracy and robustness in locating the pupil position. It achieved a detection rate of more than 90% for pupils within a 5-pixel error in all evaluated datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call