Abstract

We investigate the possibility to use Viola-Jones [1] object detection framework through a multi-model approach to build a face extraction pipeline that will be used in video appearance tagging. Although deep convolutional neural networks have surpassed previous algorithms in performance [2], Haar Cascades needs much lower memory than CNN, does not require specialized hardware, and has lower storage requirements. Most videos will show the same face more than once, at least a few close-ups that are full frontal and well lit. We need an efficient system that will extract the best appearances. This study shows the pre-trained model selection, the fine-tuning of run-time parameters and the test. After selection of models for faces, eyes, mouths and noses and testing the right runtime parameters we were able to establish a procedure that will avoid any false positives and will produce a set of well defined faces.tart your abstract here

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call