On-device deep learning technology has attracted increasing interest recently. CPUs are the most common commercial hardware on devices and many training libraries have been developed and optimized for them. However, CPUs still suffer from poor training performance (i.e., training time) due to the specific asymmetric multiprocessor. Moreover, the energy constraint imposes restrictions on battery-powered devices. With federated training, we expect the local training to be completed rapidly therefore the global model converges fast. At the same time, energy consumption should be minimized to avoid compromising the user experience. To this end, we consider energy and training time and propose a novel framework with a machine learning-based adaptive configuration allocation strategy, which chooses optimal configuration combinations for efficient ondevice training. We carry out experiments on the popular library MNN and the experimental results show that the adaptive allocation algorithm reduces substantial energy consumption, compared to all batches with fixed configurations on off-the-shelf CPUs.