The depth maps obtained by the consumer-level sensors are always noisy in the low-resolution (LR) domain. Existing methods for the guided depth super-resolution, which are based on the pre-defined local and global models, perform well in general cases (e.g., joint bilateral filter and Markov random field). However, such model-based methods may fail to describe the potential relationship between RGB-D image pairs. To solve this problem, this paper proposes a data-driven approach based on the deep convolutional neural network with global and local residual learning. It progressively upsamples the LR depth map guided by the high-resolution intensity image in multiple scales. A global residual learning is adopted to learn the difference between the ground truth and the coarsely upsampled depth map, and the local residual learning is introduced in each scale-dependent reconstruction sub-network. This scheme can restore the depth structure from coarse to fine via multi-scale frequency synthesis. In addition, batch normalization layers are used to improve the performance of depth map denoising. Our method is evaluated in noise-free and noisy cases. A comprehensive comparison against 17 state-of-the-art methods is carried out. The experimental results show that the proposed method has faster convergence speed as well as improved performances based on the qualitative and quantitative evaluations.