A comparison among keyframe extraction techniques for CNN classification based on video periocular images

Carolina Toledo Ferraz,Osmando Pereira Junior,Tamiris Trevisan Negri Borges,Marcelo Garcia Manzato,William Barcellos,Adilson Gonzaga,José Hiroki Saito

doi:10.1007/s11042-020-10384-9

Carolina Toledo Ferraz, Osmando Pereira Junior + Show 5 more

https://doi.org/10.1007/s11042-020-10384-9

Copy DOI

Abstract

Training and validation sets of labeled data are important components used in supervised learning to build a classification model. During training, most learning algorithms use all images from the given training set to estimate the model’s parameters. Particularly for video classification, it is required a keyframe extraction technique in order to select representative frames for training, which commonly is based on simple heuristics such as low level features frame difference. As some learning algorithms are noise sensitive, it is important to carefully select frames for training so that the model’s optimization is accomplished more accurately and faster. We propose in this paper to analyze four methodologies for selecting representative frames of a periocular video database. One of them is based on the thresholds calculation (T), the other is a modified Kennard-Stone (KS) model, the thir method is based on sum of absolute difference in LUV colorspace and the last one is random sampling. To evaluate the selected image sets we use two deep network methodologies: feature extraction (FE) and fine tuning (FT). The results show that with a reduced amount of training images we can achieve the same accuracy of the complete database using the modified KS refinement methodology and the FT evaluation method.

Full Text