Extraction of Information from Crowdsourcing: Experimental Test Employing Bayesian, Maximum Likelihood, and Maximum Entropy Methods

M P Silverman

doi:10.4236/ojs.2019.95038

Abstract

A crowdsourcing experiment in which viewers (the “crowd”) of a British Broadcasting Corporation (BBC) television show submitted estimates of the number of coins in a tumbler was shown in an antecedent paper (Part 1) to follow a log-normal distribution ∧(m,s2). The coin-estimation experiment is an archetype of a broad class of image analysis and object counting problems suitable for solution by crowdsourcing. The objective of the current paper (Part 2) is to determine the location and scale parameters (m,s) of ∧(m,s2) by both Bayesian and maximum likelihood (ML) methods and to compare the results. One outcome of the analysis is the resolution, by means of Jeffreys’ rule, of questions regarding the appropriate Bayesian prior. It is shown that Bayesian and ML analyses lead to the same expression for the location parameter, but different expressions for the scale parameter, which become identical in the limit of an infinite sample size. A second outcome of the analysis concerns use of the sample mean as the measure of information of the crowd in applications where the distribution of responses is not sought or known. In the coin-estimation experiment, the sample mean was found to differ widely from the mean number of coins calculated from ∧(m,s2). This discordance raises critical questions concerning whether, and under what conditions, the sample mean provides a reliable measure of the information of the crowd. This paper resolves that problem by use of the principle of maximum entropy (PME). The PME yields a set of equations for finding the most probable distribution consistent with given prior information and only that information. If there is no solution to the PME equations for a specified sample mean and sample variance, then the sample mean is an unreliable statistic, since no measure can be assigned to its uncertainty. Parts 1 and 2 together demonstrate that the information content of crowdsourcing resides in the distribution of responses (very often log-normal in form), which can be obtained empirically or by appropriate modeling.

Highlights

In a previous paper [1] to be designated Part 1, the author described a crowdsourcing experiment, implemented in collaboration with a British Broadcasting Corporation (BBC) television show, to solve a quantitative problem involving image analysis and object counting
( ) Λ m, s 2 —more accurately reflects the information contained in the collective response of the crowd? These questions are resolved in Section 5.3 by first examining a third estimation procedure based on the principle of maximum entropy (PME)
There remains Question (3): Which statistic better represents the information of the crowd—the sample mean of a falsely presumed Gaussian distribution or the expectation value calculated from the appropriate log-normal distribution? The answer to this question is somewhat subjective, since it depends on how one views the process of crowdsourcing and what one expects to learn from it

Summary

Introduction

In a previous paper [1] to be designated Part 1, the author described a crowdsourcing experiment, implemented in collaboration with a British Broadcasting Corporation (BBC) television show, to solve a quantitative problem involving image analysis and object counting. The objective of the experiment was twofold: 1) to compare the true solution with the solution obtained by sampling the estimates submitted by a large number of participating BBC viewers (the “crowd”), and 2) to find the statistical distribution of the individual responses from the crowd. The present paper, to be designated Part 2, extends the statistical analysis of crowdsourcing further. Whereas Part 1 was concerned primarily with the identity and universality of the distribution of crowd responses, Part 2 investigates the parameters by which this distribution is defined and discusses the procedure to be employed when the distribution of crowd responses is not known

Estimation of Distribution Parameters

Organization

Maximum Likelihood Estimate of Log-Normal Parameters

Bayesian Analysis of the Coin Estimation Experiment

Crowdsourcing and the Maximum Entropy Distribution

Maximum Likelihood Solution to the Maximum Entropy Equations

Answers to the Three Questions of Section 4

Quantitative Measure of Information Content

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Open Journal of Statistics	Publication Date: Jan 1, 2019
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extraction of Information from Crowdsourcing: Experimental Test Employing Bayesian, Maximum Likelihood, and Maximum Entropy Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Open Journal of Statistics

Lead the way for us

Similar Papers

Maximum Likelihood and Minimum-Steps Methods for Estimating Evolutionary Trees from Data on Discrete Characters
J Felsenstein
Systematic Biology | VOL. 22
J FelsensteinJ Felsenstein
01 Sep 1973
Systematic Biology | VOL. 22

Modelling diameter distributions of Quercus suber L. stands in “Los Alcornocales” Natural Park (Cádiz-Málaga, Spain) by using the two-parameter Weibull function
A Calzado ... E Torres
Forest Systems | VOL. 22
A Calzado, et. al.A Calzado ... E Torres
01 Apr 2013
Forest Systems | VOL. 22

Comparing maximum entropy modelling methods to inform aquaculture site selection for novel seaweed species
Kathryn H Wiltshire ... Jason E Tanner
Ecological Modelling | VOL. 429
Kathryn H Wiltshire, et. al.Kathryn H Wiltshire ... Jason E Tanner
19 May 2020
Ecological Modelling | VOL. 429

Modeling spatial dynamics of Steller sea lions (Eumetopias jubatus) using maximum likelihood and Bayesian methods: Evaluating causes for population declin
G Fay ... A Punt
-
G Fay, et. al.G Fay ... A Punt
01 May 2006
01 May 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extraction of Information from Crowdsourcing: Experimental Test Employing Bayesian, Maximum Likelihood, and Maximum Entropy Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Open Journal of Statistics