This paper shows how to convert a color image to grayscale using convolutional neural network (CNN), that preserves visual contrast via gradient domain modeling. We propose to explore the auxiliary variable principle to make the input and output variable dimensions to be the same, and use L1-norm error of the image gradients as the loss function criterion. The similarity measure calculates the summation of the gradient correlation between each channel of the color image and the transformed grayscale image. The final gray mapping result is then obtained by reconstruction from a globally initial grayscale image and locally derived gradient images. A weighted objective is proposed to balance the robustness and visual appearance of color images. Furthermore, by revealing the relation between color-to-gray and multi-exposure fusion, the network is applied to multi-exposure fusion. Both quantitative and qualitative evaluations on decolorization and multi-exposure fusion consistently demonstrate the potential of the proposed method against existing state-of-the-art algorithms.