According to the atmospheric physical model, we can use accurate transmittance and atmospheric light information to convert a hazy image into a clean one. The scene-depth information is very important for image dehazing due to the transmittance directly corresponds to the scene depth. In this paper, we propose a multi-scale depth information fusion network based on the U-Net architecture. The model uses hazy images as inputs and extracts the depth information from these images; then, it encodes and decodes this information. In this process, hazy image features of different scales are skip-connected to the corresponding positions. Finally, the model outputs a clean image. The proposed method does not rely on atmospheric physical models, and it directly outputs clean images in an end-to-end manner. Through numerous experiments, we prove that the multi-scale deep information fusion network can effectively remove haze from images; it outperforms other methods in the synthetic dataset experiments and also performs well in the real-scene test set.