Subjective Image Quality Assessment With Boosted Triplet Comparisons

Hui Men,Mohsen Jenadeleh,Dietmar Saupe,Hanhe Lin

doi:10.1109/access.2021.3118295

Hui Men, Mohsen Jenadeleh + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3118295

Copy DOI

Abstract

In subjective full-reference image quality assessment, differences between perceptual image qualities of the reference image and its distorted versions are evaluated, often using degradation category ratings (DCR). However, the DCR has been criticized since differences between rating categories on this ordinal scale might not be perceptually equidistant, and observers may have different understandings of the categories. Pair comparisons (PC) of distorted images, followed by Thurstonian reconstruction of scale values, overcome these problems. In addition, PC is more sensitive than DCR, and it can provide scale values in fractional, just noticeable difference (JND) units that express a precise perceptional interpretation. Still, the comparison of images of nearly the same quality can be difficult. We introduce boosting techniques embedded in more general triplet comparisons (TC) that increase the sensitivity even more. Boosting amplifies the artefacts of distorted images, enlarges their visual representation by zooming, increases the visibility of the distortions by a flickering effect, or combines some of the above. Experimental results show the effectiveness of boosted TC for seven types of distortion. We crowdsourced over 1.7 million responses to triplet questions. A detailed analysis shows that boosting increases the discriminatory power and allows to reduce the number of subjective ratings without sacrificing the accuracy of the resulting relative image quality values. Our technique paves the way to fine-grained image quality datasets, allowing for more distortion levels, yet with high-quality subjective annotations. We also provide the details for Thurstonian scale reconstruction from TC and our annotated dataset, KonFiG-IQA, containing 10 source images, processed using 7 distortion types at 12 or even 30 levels, uniformly spaced over a span of 3 JND units.

Highlights

Full-reference image quality assessment (FR-IQA) quantifies the perceptual image qualities of distorted versions of pristine reference images
We used general triplets and showed their potential to further increase the performance of FR-IQA compared to degradation category ratings (DCR) and baseline triplet comparisons (TC)
We proposed artefact amplification, zooming, the flicker test, and their combinations

Summary

Introduction

Full-reference image quality assessment (FR-IQA) quantifies the perceptual image qualities of distorted versions of pristine reference images. FR-IQA quantifies the tradeoff between bitrate and perceived quality in perceptual image compression, which helps optimize encoding parameters. Since it is not feasible to assess perceptual image quality by a subjective study each time in such applications, automated FR-IQA algorithms must be used that estimate the quality from the image data without any human interaction. To develop and train such FR-IQA algorithms, annotated image datasets, derived from subjective studies, are required. In such studies, images are judged by subjects according to their perceived quality, either individually or in comparison with one or more other images. This paper contributes boosting methods for the presentation of the image stimuli in subjective studies that improve the accuracy and sensitivity of the perceptual measurements

Objectives

Methods

Findings

Conclusion