Abstract

AbstractIn recent times, with the exception of sporadic cases, the trend in computer vision is to achieve minor improvements compared to considerable increases in complexity. To reverse this trend, we propose a novel method to boost image classification performances without increasing complexity. To this end, we revisited ensembling, a powerful approach, often not used properly due to its more complex nature and the training time, so as to make it feasible through a specific design choice. First, we trained two EfficientNet‐b0 end‐to‐end models (known to be the architecture with the best overall accuracy/complexity trade‐off for image classification) on disjoint subsets of data (i.e., bagging). Then, we made an efficient adaptive ensemble by performing fine‐tuning of a trainable combination layer. In this way, we were able to outperform the state‐of‐the‐art by an average of 0.5% on the accuracy, with restrained complexity both in terms of the number of parameters (by 5–60 times), and the FLoating point Operations Per Second FLOPS by 10–100 times on several major benchmark datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call