Abstract

An improved model called TR-YOLO is employed for Asian food object detection. Firstly, the ViT module is introduced into the model to make better use of global features. Secondly, the Swin Transformer module is introduced on the three detection branches to output the features. Finally, the Mconcat feature fusion method is proposed, which enables the model to learn the feature weights to assign feature channels independently. The experimental results show that the TR-YOLO model further improves the detection accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.