Recent breakthroughs in artificial intelligence promote the development of deep neural networks (DNNs)-based intelligent applications in the Internet of Things (IoT). However, these powerful DNNs have considerable model parameters, computational complexity, and storage usage, which impose heavy costs on resource-constrained IoT devices and incur unendurable latency in smart applications. To overcome these challenges, numerous works aim to design lightweight DNNs, but existing works encounter two severe bottlenecks: one is the acute accuracy degradation when reducing model parameters, and the other is the failure to balance the accuracy and resource usage under dynamic workloads.In this paper, we investigate the problems of designing and deploying DNNs on resource-constrained IoT devices. Specifically, we first design a novel multi-branch scalable neural network (MBSNN) architecture, which features multiple subnets to provide optional intelligent services of different accuracy. Then, we propose a stepwise incremental training technique and a learning rate dual decay strategy that can efficiently train MBSNN and increase accuracy. Afterwards, we develop a threshold selection-based adaptive inference mechanism, which can wisely select thresholds for MBSNN to achieve optimal accuracy under given timing constraints. Extensive experiments show that compared to benchmarking methods, our MBSNN improves accuracy by up to 1.63%, reduces computational complexity by up to 32.7%, and decreases model parameters by up to 41.9%.