Nonparametric Mean Estimation for Big-but-Biased Data

Laura Borrajo,Ricardo Cao

doi:10.3390/proceedings2181167

Abstract

Some authors have recently warned about the risks of the sentence with enough data, the numbers speak for themselves. The problem of nonparametric statistical inference in big data under the presence of sampling bias is considered in this work. The mean estimation problem is studied in this setup, in a nonparametric framework, when the biasing weight function is unknown (realistic). The problem of ignoring the weight function is remedied by having a small SRS of the real population. This problem is related to nonparametric density estimation. The asymptotic expression for the MSE of the estimator proposed is considered. Some simulations illustrate the performance of the nonparametric method proposed in this work.

Highlights

At certain times a large sample is not representative of the population, but it is biased (B3D)
Some of the problems coming from ignoring sampling bias in big data statistical analysis have been recently reported by Cao [1]
A good example cited by Crawford [2] is the data collected in the city of Boston through the StreetBump smartphone app that underestimates the number of potholes in some neighborhoods of the city, with the consequent deficient management of resources. Another example is the database of more than 20 million tweets generated by Hurricane Sandy. These data come from a biased sample of the population, since most of the tweets came from Manhattan, while few tweets were originated in the most affected areas by the catastrophe

Summary

Introduction

At certain times a large sample is not representative of the population, but it is biased (B3D). A good example cited by Crawford [2] is the data collected in the city of Boston through the StreetBump smartphone app that underestimates the number of potholes in some neighborhoods of the city, with the consequent deficient management of resources Another example is the database of more than 20 million tweets generated by Hurricane Sandy. These data come from a biased sample of the population, since most of the tweets came from Manhattan, while few tweets were originated in the most affected areas by the catastrophe In other examples, such as those cited in Hargittai [3], survey data show that the use of sites is biased yielding samples that limit the generalizability of findings.

Mean Estimation in B3D

Case Study with Simulated Data

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Nonparametric Mean Estimation for Big-but-Biased Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Sep 19, 2018
Citations: 1	License type: CC BY 4.0

Similar Papers

Nonparametric Mean Estimation for Big-But-Biased Data
Ricardo Cao ... Laura Borrajo
-
Ricardo Cao, et. al.Ricardo Cao ... Laura Borrajo
01 Jan 2018
01 Jan 2018

Non-parametric Bayesian estimation for multitype branching processes through simulation-based methods
M González ... M Mota
Computational Statistics & Data Analysis | VOL. 52
M González, et. al.M González ... M Mota
16 Jun 2007
Computational Statistics & Data Analysis | VOL. 52

Motivation and Preliminaries
Noel Lopes ... Bernardete Ribeiro
-
Noel Lopes, et. al.Noel Lopes ... Bernardete Ribeiro
01 Jan 2015
01 Jan 2015

On Modelling Sea State Bias of Jason-2 Altimeter Data Based on Significant Wave Heights and Wind Speeds
Jinyun Guo ... Chengcheng Zhu
Remote Sensing | VOL. 15
Jinyun Guo, et. al.Jinyun Guo ... Chengcheng Zhu
20 May 2023
Remote Sensing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Nonparametric Mean Estimation for Big-but-Biased Data

Abstract

Highlights

Summary

Talk to us

Similar Papers