Face super-resolution reconstruction is the process of predicting high-resolution face images from one or more observed low-resolution face images, which is a typical pathological problem. As a domain-specific super-resolution task, we can use facial priori knowledge to improve the effect of super-resolution. We propose a method of face image super-resolution reconstruction based on combined representation learning method, using deep residual networks and deep neural networks as generators and discriminators, respectively. First, the model uses residual learning and symmetrical cross-layer connection to extract multilevel features. Local residual mapping improves the expressive capability of the network to enhance performance, solves gradient dissipation in network training, and reduces the number of convolution cores in the model through feature reuse. The feature expression of the face image at the high-dimensional visual level is obtained. The visual feature is sent to the decoder through the cross-layer connection structure. The deconvolution layer is used to restore the spatial dimension gradually and repair the details and texture features of the face. Finally, combine the attention block and the residual block reconstruction in the deep residual network to super-resolution face images that are highly similar to high-resolution images and difficult to be discriminated by the discriminator. On this basis, combined representation learning is conducted to obtain numerous realistic results of visual perception. The experimental results on the face datasets can show that the Peak Signal-to-Noise Ratio of the proposed method is improved.