The fast development of the Internet of Things (IoT) and deep learning enables learning useful patterns from the massive amount of collected data with sporadic nodes in IoT systems. Federated learning has received increasing attention in distributed machine learning where only intermediate parameters are exchanged with training samples that resided at local nodes. Nonetheless, most of the existing federated learning schemes assume a homogeneous distribution of data. The assumption, however, does not apply to IoT systems because of the heterogeneity of the IoT architecture. The nonindependent and identical distribution (non-IID) property in data volume and statistical distribution of IoT nodes can impact the performance of an aggregated global model that fits all nodes. Existing federated learning solutions for non-IID data sets either have to train additional models or require extra data exchange to check the node distribution. However, due to resource constraints in IoT systems, these approaches will increase the burden on limited computation capacity and cause network overhead. To address the issue, in this article, a novel weight-similarity-based client clustering (WSCC) approach is proposed, in which clients are split into different groups based on their data set distributions. An affinity-propagation-based method with the cosine distance of the client’s weight parameters is designed to iteratively and automatically determine dynamic clusters. The proposed approach is ideal for IoT systems since there are no auxiliary models and extra data transmissions are needed. Through the theoretical convergence analysis and empirical results, we show that our proposed WSCC scheme outperforms the representative federated learning schemes under different non-IID settings, achieving up to 20% improvements in accuracy.