JPEG images are usually corrupted by various undesirable compression artifacts resulted from block-wise coarse quantization on discrete cosine transform coefficients. In recent years, deep convolutional neural networks (CNNs) have made spectacular achievements in compression artifacts reduction. However, most deep CNNs are difficult to be implemented on mobile devices due to their large number of parameters and operations. In this letter, we propose a novel deep CNN called ESCNet for lightweight JPEG compression artifacts reduction, in which enhanced separable convolution (ESConv) is carefully designed to make full use of image multi-scale information for better dense pixel value predictions. Specifically, ESConv consists of a grouped multi-scale dual depth-wise convolution (GMDDConv) and a wide-activated dual point-wise convolution (WDPConv). GMDDConv is dedicated to efficiently extracting abundant image multi-scale spatial features, which will be sent to WDPConv for effective non-linear feature fusion. The experimental results on benchmark datasets show that compared with state-of-the-art methods, our ESCNet not only achieves better performance in both objective indices and subjective quality but also greatly reduces network parameters and operations.