Traffic classification is indispensable for the Internet of Things (IoT) in intrusion detection and resource management. Deep learning (DL)-based strategies are the key tools for traffic classification due to high accuracy but still have some challenges. 1) It is hard to deploy complex DL models on resource-constraint IoT devices. 2) Performance is limited because of the ignorance of the similarity between IoT traffic. To address these issues, we propose lightweight but accurate models for traffic classification. First, we adopt a network-in-network basic model to reduce model size. Second, the basic model is trained with self-distilled response, feature map, and similarity among traffic types to enable its identification accuracy. Next, redundant filters are removed from the basic model to achieve compressed architectures. Then a teacher model updating scheme with knowledge distillation is proposed to train compressed models without compromising performance. Experimental results demonstrate that compared to the state-of-the-art deep packet model, the compressed model can achieve the highest accuracy, deal with imbalanced traffic, and reduce nearly 99% of computation overhead in two encrypted traffic classification scenarios, thus emphasizing its efficiency.