Eye Center Localization (ECL) is one of the most crucial technologies for various computer vision applications, such as eye gazing estimation and eye-tracking. Current conventional implementations consist of two phases, including locating the approximate eye regions and finding the eye center position by extracting the semantic features around the corresponding eye region. However, the combination pipeline results in the ECL accuracy being influenced by not only the environmental factors, such as the variability of photographing angles, illuminations, and the probable occlusions by eyelids or glasses, but also the quality of preceding procedures. Inspired by the ensemble mechanism in machine learning, we formulate ECL problem as a process of end-to-end voting, and the core is to select a set of local descriptors which can capture efficient independent information to vote for eye centers. With the help of deep convolutional neural networks, we are able to determine semantic descriptors around the eye regions. Each descriptor proposes a vote pointing to the corresponding eye center, and all the votes indicate the eye centers finally. The experimental results on the public databases, BioID and GI4E, show that our method achieves 80.3% and 95.2% accuracy, respectively, which outperforms the existing state-of-the-art methods, and the results based on our customized challenging database verify the robustness of our method.
Read full abstract