Abstract

In this paper, we analyze effective methods of multi-label classification of image sets in development of visual recommender systems. We propose a two-step algorithm, which at the first step performs fine-tuning of a convolutional neural network for extraction of visual features. At the second stage, the algorithm concatenates the obtained feature vectors of each image from the input set into one descriptor using modifications of a neural aggregation module based on linear squeezing of the feature space and an attention mechanism. We perform an experimental study for the Amazon Product dataset solving a problem of classification of customer interests based on photos of the products they have purchased. We show that one of the highest F1-measure indicators can be achieved for a one-level attention block with squeezing of the feature vectors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call