ABSTRACT We present a method to reconstruct the initial linear-regime matter density field from the late-time non-linearly evolved density field in which we channel the output of standard first-order reconstruction to a convolutional neural network (CNN). Our method shows dramatic improvement over the reconstruction of either component alone. We show why CNNs are not well-suited for reconstructing the initial density directly from the late-time density: CNNs are local models, but the relationship between initial and late-time density is not local. Our method leverages standard reconstruction as a preprocessing step, which inverts bulk gravitational flows sourced over very large scales, transforming the residual reconstruction problem from long-range to local and making it ideally suited for a CNN. We develop additional techniques to account for redshift distortions, which warp the density fields measured by galaxy surveys. Our method improves the range of scales of high-fidelity reconstruction by a factor of 2 in wavenumber above standard reconstruction, corresponding to a factor of 8 increase in the number of well-reconstructed modes. In addition, our method almost completely eliminates the anisotropy caused by redshift distortions. As galaxy surveys continue to map the Universe in increasingly greater detail, our results demonstrate the opportunity offered by CNNs to untangle the non-linear clustering at intermediate scales more accurately than ever before.