Abstract

Training and validation sets of labeled data are important components used in supervised learning to build a classification model. During training, most learning algorithms use all images from the given training set to estimate the model’s parameters. Particularly for video classification, it is required a keyframe extraction technique in order to select representative frames for training, which commonly is based on simple heuristics such as low level features frame difference. As some learning algorithms are noise sensitive, it is important to carefully select frames for training so that the model’s optimization is accomplished more accurately and faster. We propose in this paper to analyze four methodologies for selecting representative frames of a periocular video database. One of them is based on the thresholds calculation (T), the other is a modified Kennard-Stone (KS) model, the thir method is based on sum of absolute difference in LUV colorspace and the last one is random sampling. To evaluate the selected image sets we use two deep network methodologies: feature extraction (FE) and fine tuning (FT). The results show that with a reduced amount of training images we can achieve the same accuracy of the complete database using the modified KS refinement methodology and the FT evaluation method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.