Swin Transformer Based Pyramid Pooling Network for Food Segmentation

Qiankun Wang,Hao Sun,Ruimin Wang,Xiaoxiao Dong

doi:10.1109/seai55746.2022.9832133

Abstract

Food segmentation is critical to human health and is one of the elements of food computing that provides the basis for nutritional assessment as well as composition testing. Food image segmentation differs from general images in that it usually does not exhibit a unique spatial layout and common semantic patterns. Current food segmentation methods mainly utilize deep visual features of convolutional neural networks(CNN) to achieve image segmentation of food, which ignore the characteristics of food images and make it difficult to achieve the best segmentation performance. In this paper, we propose a Swin Transformer-based pyramid network to capture richer background and boundary information and adaptively combine local features with global features to solve the food image segmentation task. The pyramid pooling module(PPM) aggregates contextual information from different regions of the food image, thus improving the feature representation of global information. Secondly, the multi-scale features acquired by the PPM module are constructed into a feature pyramid, and the multi-scale features are weighted, and then richer edge information is extracted. Experiments are conducted on the FoodSeg103 dataset, and the results show that the method has better results compared with the traditional method, maximizing the details of edges and veins with significant improvements.

Full Text