Speed-Accuracy Tradeoffs for Detecting Sign Language Content in Video Sharing Sites

Frank M Shipman,Caio D.D Monteiro,Ricardo Gutierrez-Osuna,Satyakiran Duggina

doi:10.1145/3132525.3132559

Abstract

Sign language is the primary medium of communication for many people who are deaf or hard of hearing. Members of this community access online sign language (SL) content posted on video sharing sites to stay informed. Unfortunately, locating SL videos can be difficult since the text-based search on video sharing sites is based on metadata rather than on the video content. Low cost or real-time video classification techniques would be invaluable for improving access to this content. Our prior work developed a technique to identify SL content based on video features alone but is computationally expensive. Here we describe and evaluate three optimization strategies that have the potential to reduce the computation time without overly impacting precision and recall. Two optimizations reduce the cost of face-detection, whereas the third focuses on analyzing shorter segments of the video. Our results identify a combination of these techniques that yields a 96% reduction in computation time while losing only 1% in F1 score. To further reduce computation, we additionally explore a keyframe-based approach that achieves comparable recall but lower precision than the above techniques, making it appropriate as an early filter in a staged classifier.

Full Text