Recognizing irrelevant faces in short-form videos based on feature fusion and active learning

Haizhou Wang,Mingcheng Zhu,Rongchuan Zhang

doi:10.1016/j.neucom.2022.06.064

Abstract

In recent years, short-form videos spread rapidly around the world and became a popular way of entertainment for people to share their daily lives. However, many videos record behaviors of other people without their awareness and are uploaded onto the short-form video platforms. Such behavior severely invades personal privacy and can even bring risks of personal information leakage. At present, few studies focus on detecting privacy violations in short-form videos. Meanwhile, due to the difficulty in transferring existing models to the scenario of short-form videos and the lack of reliable datasets, it is very challenging to recognize irrelevant faces in short-form videos. To deal with this problem, we constructed and published an irrelevant faces dataset (IF-Dataset) with 43,965 irrelevant face images and 89,924 relevant face images based on the videos collected from Douyin (the Chinese version of TikTok). In addition, we constructed a framework that implemented our proposed deep learning model Multi-features Multi-head Fusion Network (MMFNet) to recognize irrelevant faces from short-form videos. The experimental results show that the F1 score of the MMFNet can reach 87.03%. We also proposed a novel loss function as well as an active learning system to improve the generalization ability of models, which can reach the Relative Error Reduction (RER) up to 29.58%. Our work provides both theoretical and practical support for face protection in short-form videos.

Full Text