A Multimodal Fusion Framework for Brand Recognition from Product Image and Context

Changbo Hu,Zhen Zhang,Qun Li,Ruofei Zhang,Keng-Hao Chang

doi:10.1109/icmew46912.2020.9105947

Abstract

The detection and recognition of a brand from product im-ages is a key capability in many computer vision and machine learning applications. Specifically, logo detection from image is one of the most distinctive and effective ways to determine the brand. However, due to the large variation in scale, geometry and appearance, etc., logo detection and recognition remains a challenging problem, even with the recent advances using deep neural networks. Another informative source for brand recognition is the textual information within the context as in most of the e-commerce websites, a product picture is often accompanied by some text description as well. To combine the best of both worlds, we propose to tackle brand recognition with a multimodal fusion framework that integrates image-based logo recognition using convolutional neural networks with context feature (product image title, description, OCR text detection from image, etc.)-based brand recognition using natural language understanding models. We demonstrated experimentally that the additional context information has significantly mitigated the limitations experienced by image-only-based logo recognition. It is worth noting that, in order to better represent text within its context, we have adopted the texts embedded using BERT in our framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multimodal Fusion Framework for Brand Recognition from Product Image and Context

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Logo Detection Using Deep Learning with Pretrained CNN Models
S. Sahel ... T. Alsubait
Engineering, Technology & Applied Science Research | VOL. 11
S. Sahel, et. al.S. Sahel ... T. Alsubait
06 Feb 2021
Engineering, Technology & Applied Science Research | VOL. 11

The study of security application of LOGO recognition technology in sports video
Zhi Li
EURASIP Journal on Image and Video Processing | VOL. 2019
Zhi LiZhi Li
18 Feb 2019
EURASIP Journal on Image and Video Processing | VOL. 2019

SLD: A Novel Robust Descriptor for Image Matching
Wen Zhou ... Chunheng Wang
IEEE Signal Processing Letters | VOL. 21
Wen Zhou, et. al.Wen Zhou ... Chunheng Wang
01 Mar 2014
IEEE Signal Processing Letters | VOL. 21

Hybrid Depression Classification and Estimation from Audio Video and Text Information
Le Yang ... Hichem Sahli
-
Le Yang, et. al.Le Yang ... Hichem Sahli
23 Oct 2017
23 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multimodal Fusion Framework for Brand Recognition from Product Image and Context

Abstract

Talk to us

Similar Papers