Abstract

The image understanding in the era of deep learning is burgeoning not only in terms of semantics but also in towards the generation of a meaningful descriptions of images, this requires specific cross model training of deep neural networks which must be complex enough to encode the fine contextual information related to the image and simple enough enough to cover wide range of inputs. Conversion of food image to its cooking description/instructions is a suitable instance of the above mentioned image understanding challenge. This paper proposes a unique method of obtaining the compressed embeddings of cooking instructions of a recipe image using cross model training of CNN, LSTM and Bi-Directional LSTM. The major challenge in this is variable length of instructions, number of instructions per recipe and multiple food items present in a food image. Our model successfully meets these challenges through transfer learning and multi-level error propagations across different neural networks by achieving condensed embeddings of cooking instruction which have high similarity with original instructions. In this paper we have specifically experimented on Indian cuisine data (Food image, Ingredients, Cooking Instruction and contextual information) scraped from the web. The proposed model can be significantly useful for information retrieval system and it can also be effectively utilized in automatic recipe recommendations.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.