On Hypothesis Testing for Comparing Image Quality Assessment Metrics [Tips &amp;amp; Tricks

Rui Zhu,Fei Zhou,Wenming Yang,Jing-Hao Xue

doi:10.1109/msp.2018.2829209

Abstract

In developing novel image quality assessment (IQA) metrics, researchers should compare their proposed metrics with state-of-the-art metrics. A commonly adopted approach is by comparing two residuals between the nonlinearly mapped scores of two IQA metrics and the difference mean opinion score, which are assumed from Gaussian distributions with zero means. An F-test is then used to test the equality of variances of the two sets of residuals. If the variances are significantly different, then we conclude that the residuals are from different Gaussian distributions and that the two IQA metrics are significantly different. The F-test assumes that the two sets of residuals are independent. However, given that the IQA metrics are calculated on the same database, the two sets of residuals are paired and may be correlated. We note this improper usage of the F-test by practitioners, which can result in misleading comparison results of two IQA metrics. To solve this practical problem, we introduce the Pitman test to investigate the equality of variances for two sets of correlated residuals. Experiments on the Laboratory for Image and Video Engineering (LIVE) database show that the two tests can provide different conclusions.

Full Text