Abstract

AbstractDue to the availability of large-scale datasets (e.g., ImageNet, UECFood) and the advancement of deep Convolutional Neural Networks (CNN), computer vision image recognition has evolved dramatically. Currently, there are three major methods for using CNN: starting from scratch, using a pre-trained network off the shelf, and performing unsupervised pre-training with supervised changes. When it comes to those with dietary restrictions, automatic food detection and assessment are critical. In this research, we show how to address detection difficulties by combining three CNNs. The different CNN architectures are then assessed. The amount of parameters in the examined CNN models ranges from 5,000 to 160 million, depending on the number of layers. Second, the various CNNs under consideration are assessed based on dataset sizes and physical image context. The results are assessed in terms of performance vs. training time vs. accuracy. Finally, the accuracy of CNNs is investigated and examined using human knowledge and classification from the human visual system (HVS). Finally, additional categorization techniques, such as bag-of-words, are considered to solve this problem. Based on the findings, it can be concluded that the HVS is more accurate when a data set comprises a wide range of variables. When the dataset is restricted to niche photos, the CNN outperforms the HVS.KeywordsCNNGoogLeNetInceptionResNetDietary

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.