Preference prediction based on a photo gallery analysis with scene recognition and object detection

A.V Savchenko,K.V Demochkin,I.S Grechikhin

doi:10.1016/j.patcog.2021.108248

Abstract

In this paper, a user modeling task is examined by processing mobile device gallery of photos and videos. We propose a novel engine for preferences prediction based on scene recognition, object detection and facial analysis. At first, all faces in a gallery are clustered, and all private photos and videos with faces from large clusters are processed on the embedded system in offline mode. Other photos may be sent to the remote server to be analyzed by very deep sophisticated neural networks. The visual features of each photo are obtained from scene recognition and object detection models. These features are aggregated into a single descriptor in the neural attention unit. The proposed pipeline is implemented in mobile Android application. Experimental results for the Photo Event Collection, Web Image Dataset for Event Recognition and Amazon Fashion data demonstrate the possibility to efficiently process images without significant accuracy degradation.

Full Text