Eye tracking and head pose estimation (HPE) have previously lacked reliability, interpretability, and comprehensibility. For instance, many works rely on traditional computer vision methods, which may not perform well in dynamic and realistic environments. Recently, a widespread trend has emerged, leveraging deep learning for HPE specifically framed as a regression task; however, considering the real-time applications, the problem could be better formulated as classification (e.g., left, centre, right head pose and gaze) using a hybrid approach. For the first time, we present a complete facial profiling approach to extract micro and macro facial movement, gaze, and eye state features, which can be used for various applications related to comprehension analysis. The multi-model approach provides discrete human-understandable head pose estimations utilising deep transfer learning, a newly introduced method of head roll calculation, gaze estimation via iris detection, and eye state estimation (i.e., open or closed). Unlike existing works, this approach can automatically analyse the input image or video frame to produce human-understandable binary codes (e.g., eye open or close, looking left or right, etc.) for each facial component (aka face channels). The proposed approach is validated on multiple standard datasets, indicating outperformance compared to existing methods in several aspects, including reliability, generalisation, completeness, and interpretability. This work will significantly impact several diverse domains, including psychological and cognitive tasks with a broad scope of applications, such as in police interrogations and investigations, animal behaviour, and smart applications, including driver behaviour analysis, student attention measurement, and automated camera flashes.
Read full abstract