Abstract

Food segmentation is critical to human health and is one of the elements of food computing that provides the basis for nutritional assessment as well as composition testing. Food image segmentation differs from general images in that it usually does not exhibit a unique spatial layout and common semantic patterns. Current food segmentation methods mainly utilize deep visual features of convolutional neural networks(CNN) to achieve image segmentation of food, which ignore the characteristics of food images and make it difficult to achieve the best segmentation performance. In this paper, we propose a Swin Transformer-based pyramid network to capture richer background and boundary information and adaptively combine local features with global features to solve the food image segmentation task. The pyramid pooling module(PPM) aggregates contextual information from different regions of the food image, thus improving the feature representation of global information. Secondly, the multi-scale features acquired by the PPM module are constructed into a feature pyramid, and the multi-scale features are weighted, and then richer edge information is extracted. Experiments are conducted on the FoodSeg103 dataset, and the results show that the method has better results compared with the traditional method, maximizing the details of edges and veins with significant improvements.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.