Deep learning is a new research direction in the field of machine learning, which was introduced into machine learning to bring it closer to its original goal. Accurate dish recognition becomes increasingly important in the multimedia community since it can help cuisine recommendation, calorie management, service improvement, and other food computing tasks. Many novel approaches have been developed on web recipes and menu pictures, while few are concerned real-life dish image analysis. In this study, a deep learning-based prototype system is deployed in a Chinese canteen, and 28 dish types, 16,904 images, and 45,061 instances have been collected. Specifically, in the prototype system, three practical issues are explored, including the backbone network selection, the training strategy determination, and the minimum number of samples for model upgrading. Experimental results suggest that fine-tuned Faster-RCNN can serve as the backbone network of the prototype system since it outperforms the other four fine-tuned networks on dish recognition (accuracy, 98.10%; recall, 97.20%; MAP (mean average precession), 98.30%) and satisfies real-time requirement (0.15 second per image). Meanwhile, the transferred backbone network achieves superior results (MAP, 96.48%) over the same architecture trained from image scratches (MAP, 87.84%). On model upgrading, a good (MAP, 91.34%) to better (MAP, 96.48%) outcome is obtained when the training size is increased from 50 to 200 samples per dish type, and 150 and more instances should be annotated if a new dish type is added to the system’s recognition list. Conclusively, the real-life deployment and evaluation of the prototype system indicate that deep learning is full of potential to enhance customer experience through accurate daily dish recognition.
Read full abstract