Multi-timescale boosting for efficient and improved event camera face pose alignment

Arman Savran

doi:10.1016/j.cviu.2023.103817

Abstract

The success of event camera (EC) vision in certain types of applications has been steadily shown thanks to energy-efficient sparse sensing, high dynamic range, and extremely high temporal resolution. However, the utilization of ECs for facial processing tasks has remained rather limited. To enable high energy efficiency for large face pose alignment, which is a crucial facial pre-processing stage, we aim at leveraging EC by effective adaptation of the processing rate proportional to facial movement intensity. For this purpose, we propose a novel alternative to the commonly employed constant time frame and event count frame strategies which combines their advantages and provides the benefits of supervised learning. This is realized by a multi-timescale boosting framework that can generate highly sparse pose-events at a variable rate via detection-based online timescale selection. Although detectors of multiple scales with boosted sensitivities operate as a cascade, our method provides minimal delay essential for real-time applications. Comprehensive evaluations show that the proposed multi-timescale processing substantially improves the performance–efficiency trade-off over single-timescale frames and markedly over event count frames. Mega-floating-point-operations-per-second ranges from 2.5 at the moderate motion clips to 6.5 at the intense motion clips, with negligible computation in the absence of activity. Also, alignment errors are considerably reduced by online selection of small timescales at fast head motion and of bigger timescales at slower motion or local activity of lips and eyes. Being orthogonal and complementary to spatial domain techniques, the proposed approach can also be conveniently integrated with future advances for further performance/efficiency improvements or for alignment extensions.

Full Text