Abstract
Driver drowsiness increases crash risk, leading to substantial road trauma each year. Drowsiness detection methods have received considerable attention, but few studies have investigated the implementation of a detection approach on a mobile phone. Phone applications reduce the need for specialised hardware and hence, enable a cost-effective roll-out of the technology across the driving population. While it has been shown that three-dimensional (3D) operations are more suitable for spatiotemporal feature learning, current methods for drowsiness detection commonly use frame-based, multi-step approaches. However, computationally expensive techniques that achieve superior results on action recognition benchmarks (e.g. 3D convolutions, optical flow extraction) create bottlenecks for real-time, safety-critical applications on mobile devices. Here, we show how depthwise separable 3D convolutions, combined with an early fusion of spatial and temporal information, can achieve a balance between high prediction accuracy and real-time inference requirements. In particular, increased accuracy is achieved when assessment requires motion information, for example, when sunglasses conceal the eyes. Further, a custom TensorFlow-based smartphone application shows the true impact of various approaches on inference times and demonstrates the effectiveness of real-time monitoring based on out-of-sample data to alert a drowsy driver. Our model is pre-trained on ImageNet and Kinetics and fine-tuned on a publicly available Driver Drowsiness Detection dataset. Fine-tuning on large naturalistic driving datasets could further improve accuracy to obtain robust in-vehicle performance. Overall, our research is a step towards practical deep learning applications, potentially preventing micro-sleeps and reducing road trauma.
Highlights
Each year, motor vehicle accidents contribute to over 1.2 million fatalities globally [52]
Our model is pre-trained on ImageNet and Kinetics and fine-tuned on a publicly available Driver Drowsiness Detection dataset
The prediction accuracy of the Inflated 3D ConvNet (I3D) model is 5.8% higher than InceptionV1, which clearly highlights the benefits of using 3D over 2D convolutions for action recognition
Summary
Motor vehicle accidents contribute to over 1.2 million fatalities globally [52]. In 95–99% of these crashes, human error, including driver drowsiness, is a contributing factor [14]. In the USA, crashes related to driver fatigue led to over 800 fatalities in 2014 and 37,000 injuries per year between 2005 and 2009 [29]. The association between driver drowsiness and crash risk has been confirmed in various studies. Williamson et al [55] found that sleep homeostatic effects produce impaired performance and accidents. Cumulative sleep debt increases the risk of crashing, as shown in a case-control study of heavy-vehicle drivers [43]. Bouchner et al [5] showed that drowsy drivers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.