Abstract

With the development of deep learning technology, vision-based food nutrition estimation is gradually entering the public view for its advantage in accuracy and efficiency. In this paper, we designed one RGB-D fusion network, which integrated multimodal feature fusion (MMFF) and multi-scale fusion for visioin-based nutrition assessment. MMFF performed effective feature fusion by a balanced feature pyramid and convolutional block attention module. Multi-scale fusion fused different resolution features through feature pyramid network. Both enhanced feature representation to improve the performance of the model. Compared with state-of-the-art methods, the mean value of the percentage mean absolute error (PMAE) for our method reached 18.5%. The PMAE of calories and mass reached 15.0% and 10.8% via the RGB-D fusion network, improved by 3.8% and 8.1%, respectively. Furthermore, this study visualized the estimation results of four nutrients and verified the validity of the method. This research contributed to the development of automated food nutrient analysis (Code and models can be found at http://123.57.42.89/codes/RGB-DNet/nutrition.html).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call