Abstract

Road accidents have been a significant problem in recent years. As per statistics, this is primarily due to the driver’s drowsy behavior. As an impact, many valuable lives have been lost in road accidents. So, a reliable system is required to overcome this issue. As part of this meticulous analysis, we have chosen a sizable realistic drowsiness video dataset created by the University of Texas. After that, we picked just the extreme classes of videos, such as alert and drowsy, from this dataset. Then, we created two distinct models, namely Model-A for temporal features and Model-B for spatiotemporal characteristics. In the first model, computer vision techniques, i.e., YOLOv3, are used to retrieve temporal characteristics, then processed using long short-term memory (LSTM). Here, we suited the occlusion issue by imposing a condition on each frame. The overfitting problem arises when occluded frames are discarded during this procedure. This issue is handled with the help of TransGAN’s augmentation approach. The second model, on the other hand, extracts spatial information using a convolution neural network (CNN) called InceptionV3, which is subsequently processed using LSTM. Even though Model-A is more complicated and has lower accuracy, i.e., 86%, than Model-B, with an accuracy of 97.5%, the investigation reveals that Model-A seems much superior to Model-B regarding the training period. These differences are emphasized through the AUC-ROC score and confusion metrics.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call