Cryo-electron microscopy (cryo-EM) is a widely used structural determination technique. Because of the extremely low signal-to-noise ratio (SNR) of images captured by cryo-EM, clustering single-particle cryo-EM images with high accuracy is challenging. To address this, we proposed an iterative denoising and clustering method based on a deep convolutional variational autoencoder and K-means++. The proposed method contains two modules: a denoising ResNet variational autoencoder (DRVAE), and Balance size K-means++ (BSK-means++). First, the DRVAE is trained in a fully unsupervised manner to initialize the neural network and obtain preliminary denoised images. Second, BSK-means++ is built for clustering denoised images, and images closer to class centers are divided into reliable samples. Third, the training of DRVAE is continued, while the class-average images are used as pseudo supervision of reliable samples to reserve more detailed information of denoised images. Finally, the second and third steps mentioned above can be performed jointly and iteratively until convergence occurs. The experimental results showed that the proposed method can generate reliable class average images and achieve better clustering accuracy and normalized mutual information than current methods. This study confirmed that DRVAE with BSK-means++ could achieve a good denoise performance on single-particle cryo-EM images, which can help researchers obtain information such as symmetry and heterogeneity of the target particles. In addition, the proposed method avoids the extreme imbalance of class size, which improves the reliability of the clustering result.