Screen content images (SCIs) usually comprise various content types with sharp edges, in which artifacts or distortions can be effectively sensed by a vanilla structure similarity measurement in a full-reference manner. Nonetheless, almost all of the current state-of-the-art (SOTA) structure similarity metrics are “locally” formulated in a single-level manner, while the true human visual system (HVS) follows the multilevel manner; such mismatch could eventually prevent these metrics from achieving reliable quality assessment. To ameliorate this issue, this article advocates a novel solution to measure structure similarity “globally” from the perspective of sparse representation. To perform multilevel quality assessment in accordance with the real HVS, the abovementioned global metric will be integrated with the conventional local ones by resorting to the newly devised selective deep fusion network. To validate its efficacy and effectiveness, we have compared our method with 12 SOTA methods over two widely used large-scale public SCI datasets, and the quantitative results indicate that our method yields significantly higher consistency with subjective quality scores than the current leading works. Both the source code and data are also publicly available to gain widespread acceptance and facilitate new advancement and validation.
Read full abstract