On benchmark images, modern dehazing methods are able to achieve very comparable results whose differences are too subtle for people to qualitatively judge. Thus, it is imperative to adopt quantitative evaluation on a vast number of hazy images. However, existing quantitative evaluation schemes are not convincing due to a lack of appropriate datasets and poor correlations between metrics and human perceptions. In this work, we attempt to address these issues, and we make two contributions. First, we establish two benchmark datasets, i.e., the BEnchmark Dataset for Dehazing Evaluation (BeDDE) and the EXtension of the BeDDE (exBeDDE), which had been lacking for a long period of time. The BeDDE is used to evaluate dehazing methods via full reference image quality assessment (FR-IQA) metrics. It provides hazy images, clear references, haze level labels, and manually labeled masks that indicate the regions of interest (ROIs) in image pairs. The exBeDDE is used to assess the performance of dehazing evaluation metrics. It provides extra dehazed images and subjective scores from people. To the best of our knowledge, the BeDDE is the first dehazing dataset whose image pairs were collected in natural outdoor scenes without any simulation. Second, we provide a new insight that dehazing involves two separate aspects, i.e., visibility restoration and realness restoration, which should be evaluated independently; thus, to characterize them, we establish two criteria, i.e., the visibility index (VI) and the realness index (RI), respectively. The effectiveness of the criteria is verified through extensive experiments. Furthermore, 14 representative dehazing methods are evaluated as baselines using our criteria on BeDDE. Our datasets and relevant code are available at https://github.com/xiaofeng94/BeDDE-for-defogging.
Read full abstract