Food Image to Cooking Instructions Conversion Through Compressed Embeddings Using Deep Learning

Madhu Kumari,Tajinder Singh

doi:10.1109/icdew.2019.00-31

Abstract

The image understanding in the era of deep learning is burgeoning not only in terms of semantics but also in towards the generation of a meaningful descriptions of images, this requires specific cross model training of deep neural networks which must be complex enough to encode the fine contextual information related to the image and simple enough enough to cover wide range of inputs. Conversion of food image to its cooking description/instructions is a suitable instance of the above mentioned image understanding challenge. This paper proposes a unique method of obtaining the compressed embeddings of cooking instructions of a recipe image using cross model training of CNN, LSTM and Bi-Directional LSTM. The major challenge in this is variable length of instructions, number of instructions per recipe and multiple food items present in a food image. Our model successfully meets these challenges through transfer learning and multi-level error propagations across different neural networks by achieving condensed embeddings of cooking instruction which have high similarity with original instructions. In this paper we have specifically experimented on Indian cuisine data (Food image, Ingredients, Cooking Instruction and contextual information) scraped from the web. The proposed model can be significantly useful for information retrieval system and it can also be effectively utilized in automatic recipe recommendations.

Full Text