Abstract

Crowdsourcing (CS) has evolved into a mature assessment methodology for subjective experiments in diverse scientific fields and in particular for QoE assessment. However, the results acquired for absolute category rating (ACR) scales through CS are often not fully comparable to QoE assessments done in laboratory environments. A possible reason for such differences may be the scale usage heterogeneity problem caused by deviant scale usage of the crowd workers. In this paper, we study different implementations of (quality) rating scales (in terms of design and number of answer categories) in order to identify if certain scales can help to overcome scale usage problems in crowdsourcing. Additionally, training of subjects is well known to enhance result quality for laboratory ACR evaluations. Hence, we analyzed the appropriateness of training conditions to overcome scale usage problems across different samples in crowdsourcing. As major results, we found that filtering of user ratings and different scale designs are not sufficient to overcome scale usage heterogeneity, but training sessions despite their additional costs, enhance result quality in CS and properly counterfeit the identified scale usage heterogeneity problems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.