Abstract

Accessibility on edge devices and the trade-off between latency and accuracy is an area of interest in deploying deep learning models. This paper explores a Mixture of Experts system, namely, DynK-Hydra, which allows training of an ensemble formed of multiple similar branches on data sets with a high number of classes, but uses, during the inference, only a subset of necessary branches. We achieve this by training a cohort of specialized branches (deep network of reduced size) and a gater/supervisor, that decides dynamically what branch to use for any specific input. An original contribution is that the number of chosen models is dynamically set, based on how confident the gater is (similar works use a static parameter for this). Another contribution is the way we ensure the branches’ specialization. We divide the data set classes into multiple clusters, and we assign a cluster to each branch while enforcing its specialization on this cluster by a separate loss function. We evaluate DynK-Hydra on CIFAR-100, Food-101, CUB-200, and ImageNet32 data sets and we obtain improvements of up to 4.3% accuracy compared with state-of-the-art ResNet. All this while reducing the number of inference flops by a factor of 2–5.5 times. Compared to a similar work (HydraRes), we obtain marginal accuracy improvements of up to 1.2% on the pairwise inference time architectures. However, we improve the inference times by up to 2.8 times compared to HydraRes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call