Shifted-Window Hierarchical Vision Transformer for Distracted Driver Detection

Hong Vin Koay,Joon Huang Chuah,Chee-Onn Chow

doi:10.1109/tensymp52854.2021.9550995

Abstract

Distracted driving is one of the leading causes of fatal road accidents. Current studies mainly use convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to classify distracted action through spatial and spectral information. Following the success application of transformer in natural language processing (NLP), transformer is introduced to handle computer vision tasks. Vision transformer can mine long-range relationship and less loss of information between layers. Compared to a regular vision transformer, a hierarchical transformer with representation computed with shifted windows could limit the self-attention computation, yielding more computation efficiency. In this work, we conduct a review on shifted-window hierarchical vision transformers, following the exact implementation of Swin Transformer in classifying distracted drivers through the American University in Cairo Distracted Driver Dataset (AUC-DDD). Results show that shifted-window hierarchical transformer can achieve a classification accuracy of 95.72% in distracted driver detection.

Full Text