Revamping Image-Recipe Cross-Modal Retrieval with Dual Cross Attention Encoders

Wenhao Liu,Limeng Gao,Zhen Wang,Xinyi Chang,Simiao Yuan,Zhenrui Zhang

doi:10.3390/math12203181

Abstract

The image-recipe cross-modal retrieval task, which retrieves the relevant recipes according to food images and vice versa, is now attracting widespread attention. There are two main challenges for image-recipe cross-modal retrieval task. Firstly, a recipe’s different components (words in a sentence, sentences in an entity, and entities in a recipe) have different weight values. If a recipe’s different components own the same weight, the recipe embeddings cannot pay more attention to the important components. As a result, the important components make less contribution to the retrieval task. Secondly, the food images have obvious properties of locality and only the local food regions matter. There are still difficulties in enhancing the discriminative local region features in the food images. To address these two problems, we propose a novel framework named Dual Cross Attention Encoders for Cross-modal Food Retrieval (DCA-Food). The proposed framework consists of a hierarchical cross attention recipe encoder (HCARE) and a cross attention image encoder (CAIE). HCARE consists of three types of cross attention modules to capture the important words in a sentence, the important sentences in an entity and the important entities in a recipe, respectively. CAIE extracts global and local region features. Then, it calculates cross attention between them to enhance the discriminative local features in the food images. We conduct the ablation studies to validate our design choices. Our proposed approach outperforms the existing approaches by a large margin on the Recipe1M dataset. Specifically, we improve the R@1 performance by +2.7 and +1.9 on the 1k and 10k testing sets, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Revamping Image-Recipe Cross-Modal Retrieval with Dual Cross Attention Encoders

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Journal: Mathematics	Publication Date: Oct 11, 2024
License type: CC BY 4.0

Similar Papers

Multi-Level Joint Feature Learning for Person Re-Identification
Shaojun Wu ... Ling Gao
Algorithms | VOL. 13
Shaojun Wu, et. al.Shaojun Wu ... Ling Gao
29 Apr 2020
Algorithms | VOL. 13

Dual-stream feature fusion network for person re-identification
Wenbin Zhang ... Zhihua Liu
Engineering Applications of Artificial Intelligence | VOL. 131
Wenbin Zhang, et. al.Wenbin Zhang ... Zhihua Liu
15 Jan 2024
Engineering Applications of Artificial Intelligence | VOL. 131

Local Heterogeneous Features for Person Re-Identification in Harsh Environments
Haijia Zhang ... Hao Ma
IEEE Access | VOL. 8
Haijia Zhang, et. al.Haijia Zhang ... Hao Ma
01 Jan 2020
IEEE Access | VOL. 8

Discriminant local features selection using efficient density estimation in a large database
Alexis Joly ... Olivier Buisson
-
Alexis Joly, et. al.Alexis Joly ... Olivier Buisson
10 Nov 2005
10 Nov 2005

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Revamping Image-Recipe Cross-Modal Retrieval with Dual Cross Attention Encoders

Abstract

Talk to us

Similar Papers

More From: Mathematics