Abstract

The research presented here is centered on the emergence of deep learning processes related to food, with a particular focus on map generation and cross-modal food retrieval. These tasks are aimed at enhancing their performance through unsupervised learning using map tree models. Creating recipes involves assembling food from images, which are obtained using a slicer. To enable this, different foods must be associated with images that resemble recipes based on queries. The challenge of translating directly from images to text led to the incorporation of visual objects in supporting these images, which ultimately improves performance. The Recipe1M dataset serves as the foundation of this research, featuring images of recipes along with ingredient descriptions and cooking instructions. However, there are significant differences from typical image-text cross-modal datasets like Flickr and MS-COCO, mainly due to multi-sentence cooking instructions and the lack of ingredient images. To address these challenges, a Structure-Aware Generative Network (SGN) is introduced, combining tree data with training methods to generate maps. This involves utilizing an RNN for map tree creation based on food images and integrating tree models into recipe generation with the help of GAT, ensuring detailed food descriptions. The study also highlights the efficacy of unsupervised learning using map tree models in cross-modal food retrieval. Various modules like 2tree, img2tree, and tree2recipe are employed to handle tree structures, contributing to the recipe generation process. The contributions of this research include the proposal of a 2-tree map, the introduction of img2tree and tree2recipe modules for creating and utilizing tree models, and the demonstration of performance improvements over existing methods. Additionally, experiments beyond the Recipe1M dataset are conducted to validate the effectiveness of the proposed approach. In summary, this study addresses the gap in knowledge related to long-cooked foods by employing unsupervised sentence-level tree learning. This approach integrates recipes and enhances cross-modal food retrieval, showcasing the potential to transform cooking-related tasks and achieving state- of-the-art performance on the Recipe1M dataset. The research includes imaging and ablation studies to assess its effectiveness thoroughly Key Words- Deep Learning, Unsupervised Learning, Cross-Modal Food Retrieval, Recipe Generation, Structure-Aware Generative Network, Map Tree Models, Recipe1M Dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call