Abstract

MLP-Mixer is a vision architecture that solely relies on multilayer perceptrons (MLPs), which despite their simple architecture, they achieve a slightly inferior accuracy to the state-of-the-art models on ImageNet. Given that the MLP-Mixer segments each input image into a fixed number of patches, small-scale MLP-Mixers are preferred due to attaining better accuracy because the image is segmented into more patches. However, this strategy significantly increases the computational burden. Nevertheless, this paper argues that even in the same dataset, each image has a different recognition difficulty due to its characteristics. Therefore, in the ideal case, choosing an independently scaled MLP-Mixer for each image is the most economical computational approach. Hence, this paper experimentally verifies the objective existence of this phenomenon, which inspires us to propose the Multi-Scale MLP-Mixer (MSMLP) that utilizes a suitably scaled MLP-Mixer for each input image. MSMLP comprises several MLP-Mixers of different scales. During testing, these MLP-Mixers are activated in order of scale from large to small (increasing number of patches and decreasing patch size). In addition, to reduce redundant computations, a feature reuse mechanism is designed between neighboring MLP-Mixers so that the small-scale MLP-Mixer downstream can reuse the features learned by the larger-scale MLP-Mixer upstream. Finally, extensive experiments on the public dataset CIFAR10/100 reveal that our method’s theoretically estimated computational cost and actual inference speed are significantly higher than those of MLP-Mixer.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.