Blink detection is considered a useful indicator both for clinical conditions and drowsiness state. In this work, we propose and compare deep learning architectures for the task of detecting blinks in video frame sequences. The first step is the training and application of an eye detector that extracts the eye regions from each video frame. The cropped eye regions are organized as three-dimensional (3D) input with the third dimension spanning time of 300 ms. Two different 3D convolutional neural networks are utilized (a simple 3D CNN and 3D ResNet), as well as a 3D autoencoder combined with a classifier coupled to the latent space. Finally, we propose the usage of a frame prediction accumulator combined with morphological processing and watershed segmentation to detect blinks and determine their start and stop frame in previously unseen videos. The proposed framework was trained on ten (9) different participants and tested on five (8) different ones, with a total of 162,400 frames and 1172 blinks for each eye. The start and end frame of each blink in the dataset has been annotate by specialized ophthalmologist. Quantitative comparison with state-of-the-art blink detection methodologies provide favorable results for the proposed neural architectures coupled with the prediction accumulator, with the 3D ResNet being the best as well as the fastest performer.
Read full abstract