Abstract

A composite random variable is a product (or sum of products) of statistically distributed quantities. Such a variable can represent the solution to a multi-factor quantitative problem submitted to a large, diverse, independent, anonymous group of non-expert respondents (the “crowd”). The objective of this research is to examine the statistical distribution of solutions from a large crowd to a quantitative problem involving image analysis and object counting. Theoretical analysis by the author, covering a range of conditions and types of factor variables, predicts that composite random variables are distributed log-normally to an excellent approximation. If the factors in a problem are themselves distributed log-normally, then their product is rigorously log-normal. A crowdsourcing experiment devised by the author and implemented with the assistance of a BBC (British Broadcasting Corporation) television show, yielded a sample of approximately 2000 responses consistent with a log-normal distribution. The sample mean was within ~12% of the true count. However, a Monte Carlo simulation (MCS) of the experiment, employing either normal or log-normal random variables as factors to model the processes by which a crowd of 1 million might arrive at their estimates, resulted in a visually perfect log-normal distribution with a mean response within ~5% of the true count. The results of this research suggest that a well-modeled MCS, by simulating a sample of responses from a large, rational, and incentivized crowd, can provide a more accurate solution to a quantitative problem than might be attainable by direct sampling of a smaller crowd or an uninformed crowd, irrespective of size, that guesses randomly.

Highlights

  • Estimation of an Unknown Composite Quantity by Large-Scale SamplingThe global reach of telecommunications media, including radio, television, and in particular the social media sites of the internet, make possible an ease and scale of statistical sampling hitherto inconceivable

  • A Monte Carlo simulation (MCS) of the experiment, employing either normal or log-normal random variables as factors to model the processes by which a crowd of 1 million might arrive at their estimates, resulted in a visually perfect log-normal distribution with a mean response within ~5% of the true count

  • The analyst is faced with three general questions: 1) How are the basis random variable (RV) distributed? 2) What will be the distribution of the composite RV? 3) Which statistic of the composite RV should be taken to represent the physical value of the sought-for quantity? By examining this archetypical question a) theoretically, b) computationally by Monte Carlo simulation, and c) experimentally, this paper addresses the preceding three questions

Read more

Summary

Introduction

The global reach of telecommunications media, including radio, television, and in particular the social media sites of the internet, make possible an ease and scale of statistical sampling hitherto inconceivable. A composite RV is a product of two or more factor RVs. it is shown that: 1) the most useful characteristic of a crowdsourced sample is its distribution function and not just a single statistic, 2) under conditions to be specified, a product of RVs is distributed log-normally to an excellent approximation, irrespective of the type or number or correlation of factor RVs, 3) computer simulation methods can model the response of a hypothetical rational crowd orders of magnitude larger than what might be practically attainable

Background
Organization
Monte-Carlo Simulations of a Composite Random Variable
Commentary
Test of Crowdsourced Estimation
The Coin-Estimation Experiment
Monte Carlo Simulation of the Coin Estimation Experiment
Commentary on the Experiment and Simulations
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.