Abstract

In this paper, we propose a novel hybrid transformer architecture for food cuisine detection and classification. The work carried out within this paper develops a combination of Vision Transformer ensemble architecture with hand-crafted features, thereby making a hybrid Vision Transformer food recognition system. Recently, Vision transformers have been introduced as an alternative means of classification to convolutional neural networks. It performs pattern detection and classification without convolutions and interprets an image as a sequence of patches. The combination of Vision Transformer and hand-crafted features like GIST, HoG (Histogram of Oriented Gradients), and LBP (Local Binary Pattern) were employed on the dataset. The dataset was specifically created (for this work) from the public logging system. It consisted of 13 food categories with 400 images of Indian food items like Ghevar, Idli, Dosa, and much more. It helped to capture a variety of images from every domain and culture. This work made use of the common and readily available food items, which can further be increased by adding on the specialties (dishes) from different regions. Various experiments were performed on CNN with various classifiers like Random forest, and SVM. Further, we compared our proposed approach with several ensembles of CNN architectures. The experiments proved that our proposed approach outperformed the state-of-the-art ensemble CNN architectures for detecting food cuisines. The proposed hybrid approach achieved an accuracy of 94.63%, sensitivity 84.42%, specificity 95.23%, and kappa coefficient 0.93, which was the best amongst all approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.