Abstract

We propose a new approach for Extreme States Classification (ESC) on feature spaces of facial cues in sign language (SL) videos. The method is built upon Active Appearance Model (AAM) face tracking and feature extraction of global and local AAMs. ESC is applied on various facial cues - as, for instance, pose rotations, head movements and eye blinking - leading to the detection of extreme states such as left/right, up/down and open/closed. Given the importance of such facial events in SL analysis, we apply ESC to detect visual events on SL videos, including both American (ASL) and Greek (GSL) corpora, yielding promising qualitative and quantitative results. Further, we show the potential of ESC for assistive annotation tools and demonstrate a link of the detections with indicative higher-level linguistic events. Given the lack of facial annotated data and the fact that manual annotations are highly time-consuming, ESC results indicate that the framework can have significant impact on SL processing and analysis.

Highlights

  • Facial events are inevitably linked with human communication and are more than essential for gesture and sign language (SL) comprehension

  • 6 Experimental results we present qualitative results on Greek sign language (GSL) (Section 6.1) which lacks annotations, a quantitative comparison between Extreme States Classification (ESC), supervised classification and k-means clustering on American sign language (ASL) (Section 6.2), a quantitative testing of the effect of Appearance Model (AAM) fitting accuracy on ESC performance (Section 6.3) and a subject-independent application on IMM (Section 6.4)

  • Even though the task is easier - IMM data has more clear extreme poses than SL videos - these results indicate that ESC is subject independent

Read more

Summary

Introduction

Facial events are inevitably linked with human communication and are more than essential for gesture and sign language (SL) comprehension. Both from the automatic visual processing and the recognition viewpoint, facial events are difficult to detect, describe and model. We focus on the detection of such low-level visual events in video sequences which can be proved important both for SL analysis and for automatic SL recognition (ASLR) [5,6]. SL video corpora are widely employed by linguists, annotators and computer scientists for the study of SL and the training of ASLR systems. All the above require manual annotation of facial events, either for linguistic analysis or for ground truth transcriptions. All the above led on efforts towards the development of automatic or semiautomatic annotation tools [12,13,14] for the processing of corpora

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call