Face extraction and clustering with Viola - Jones object detection framework and T-SNE dimensionality reduction

S Milutinovici

doi:10.21279/1454-864x-23-i2-012

Abstract

We investigate the possibility to use Viola-Jones [1] object detection framework through a multi-model approach to build a face extraction pipeline that will be used in video appearance tagging. Although deep convolutional neural networks have surpassed previous algorithms in performance [2], Haar Cascades needs much lower memory than CNN, does not require specialized hardware, and has lower storage requirements. Most videos will show the same face more than once, at least a few close-ups that are full frontal and well lit. We need an efficient system that will extract the best appearances. This study shows the pre-trained model selection, the fine-tuning of run-time parameters and the test. After selection of models for faces, eyes, mouths and noses and testing the right runtime parameters we were able to establish a procedure that will avoid any false positives and will produce a set of well defined faces.tart your abstract here

Full Text