Subjective experiments are considered the most reliable way to assess the perceived visual quality. However, observers’ opinions are characterized by large diversity: in fact, even the same observer is often not able to exactly repeat his first opinion when rating again a given stimulus. This makes the Mean Opinion Score (MOS) alone, in many cases, not sufficient to get accurate information about the perceived visual quality. To this aim, it is important to have a measure characterizing to what extent the observed or predicted MOS value is reliable and stable. For instance, the Standard deviation of the Opinions of the Subjects (SOS) could be considered as a measure of reliability when evaluating the quality subjectively. However, we are not aware of the existence of models or algorithms that allow to objectively predict how much diversity would be observed in subjects’ opinions in terms of SOS. In this work we observe, on the basis of a statistical analysis made on several subjective experiments, that the disagreement between the quality as measured by means of different objective video quality metrics (VQMs) can provide information on the diversity of the observers’ ratings on a given processed video sequence (PVS). In light of this observation we: i) propose and validate a model for the SOS observed in a subjective experiment; ii) design and train Neural Networks (NNs) that predict the average diversity that would be observed among the subjects’ ratings for a PVS starting from a set of VQMs values computed on such a PVS; iii) give insights into how the same NN based approach can be used to identify potential anomalies in the data collected in subjective experiments.
Read full abstract