Abstract

Subjective speech quality assessment has traditionally been carried out in laboratory environments under controlled conditions. With the advent of crowdsourcing platforms tasks, which need human intelligence, can be resolved by crowd workers over the Internet. Crowdsourcing also offers a new paradigm for speech quality assessment, promising higher ecological validity of the quality judgments at the expense of potentially lower reliability. This paper compares laboratory-based and crowdsourcing-based speech quality assessments in terms of comparability of results and efficiency. For this purpose, three pairs of listening-only tests have been carried out using three different crowdsourcing platforms and following the ITU-T Recommendation P.808. In each test, listeners judge the overall quality of the speech sample following the Absolute Category Rating procedure. We compare the results of the crowdsourcing approach with the results of standard laboratory tests performed according to the ITU-T Recommendation P.800. Results show that in most cases, both paradigms lead to comparable results. Notable differences are discussed with respect to their sources, and conclusions are drawn that establish practical guidelines for crowdsourcing-based speech quality assessment.

Highlights

  • Quality of Experience (QoE) research concentrates on understanding user requirements towards systems or services, as well as their perceptions and judgments

  • We look at the validity, certainty gain, and reliability of crowdsourcing mean opinion scores (MOS) as a function of the number of votes

  • We extend them by considering larger simulation runs, more QoE metrics, and a method for aggregating result of all metrics

Read more

Summary

Introduction

Quality of Experience (QoE) research concentrates on understanding user requirements towards systems or services, as well as their perceptions and judgments. QoE studies have addressed systems or services for multimedia content creation, transmission, and rendering. Standardized guidelines exist for such experiments, e.g. in the Recommendations of the P-series of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T), or in the Recommendations of the BS- and BT-series of the Radiocommunication Sector (ITU-R). These guidelines describe the requirements towards test participants, test design, set-up, procedure and analysis, as well as the laboratory environment in which tests should be carried out

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call