Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Online Nonprobability Samples

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Online nonprobability samples provide social scientists with opportunities to conduct surveys and experiments on large, diverse samples at modest prices. Researchers may find bewildering the options offered by the many commercial entities that provide research participants, and our review seeks to orient researchers to key issues about their use. We discuss principles and evidence regarding estimates from nonprobability samples versus those from probability samples. We also describe methods for addressing certain types of problem participants that one encounters in these samples: professional respondents, participants who are inattentive or have low linguistic competence, and bogus participants (increasingly in the form of bots). We urge researchers not to take data quality for granted, not to rely on indirect information to vouch for data quality, and to proactively build methods that allow for the evaluation of data quality into their instruments.

Similar Papers
  • Research Article
  • Cite Count Icon 60
  • 10.1177/0898264306291420
Using Probability vs. Nonprobability Sampling to Identify Hard-to-Access Participants for Health-Related Research
  • Aug 1, 2006
  • Journal of Aging and Health
  • Lucy Feild + 4 more

This article compares the recruitment costs and participant characteristics associated with the use of probability and nonprobability sampling strategies in a longitudinal study of older hemodialysis patients and their spouses. Contrasts were made of people who accrued to the study based on probability and nonprobability sampling strategies. Probability-based sampling was more time-efficient and cost-effective than nonprobability sampling. There were no significant differences between the respondents identified through probability and nonprobability sampling on age, gender, years married, education, work status, and professional job status. Respondents from the probability sample were more likely to be Protestant and less likely to be Catholic than those from the nonprobability sample. Respondents from the probability sample were more likely to be Black, whereas those from the nonprobability sample were more likely to be White. There are strengths and shortcomings associated with both nonprobability and probability sampling. Researchers need to consider representativeness and external validity issues when designing sampling and related recruitment plans for health-related research.

  • Book Chapter
  • Cite Count Icon 105
  • 10.1002/9781118763520.ch2
A critical review of studies investigating the quality of data obtained with online panels based on probability and nonprobability samples1
  • Apr 11, 2014
  • Mario Callegaro + 3 more

This chapter provides an overview of studies comparing the quality of data collected by online survey panels by looking at three criteria: (1) comparisons of point estimates from online panels to high-quality, established population benchmarks; (2) comparisons of the relationship among variables; and (3) the reproducibility of results for online survey panels conducted on probability samples to panels conducted on nonprobability samples. When looking at point estimates, all online survey panels differed to some extent from the population benchmarks. However, the largest comparison studies suggest that point estimates from online panels of nonprobability samples have higher differences as compared to benchmarks than online panels of probability samples. This finding is consistent across time and across studies conducted in different countries. Moreover, post-stratification weighting strategies helped little and in an inconsistent way to reduce such differences for data coming from online panels of nonprobability samples, whereas these strategies did bring estimates from online panels of probability samples consistently closer to the benchmarks. When comparing relationships among variables, it was found that researchers would reach different conclusions when using online panels of nonprobability samples versus panels of probability samples. When looking at reproducibility of results, the limited evidence found suggests that there are no substantial differences in replication and effect size across probability and nonprobability samples for question wording experiments and when comparing students samples to other samples. It is worth noting that in pre-election polls, an area where abundant prior knowledge exists, online panels of nonprobability samples have consistently performed as well and in some cases better than polls based on probability samples in predicting election winners.

  • Research Article
  • 10.1016/j.procs.2024.09.094
Intelligent Monitoring of Data Quality Based on Multiple Data Structures
  • Jan 1, 2024
  • Procedia Computer Science
  • Yanhong Bai

Intelligent Monitoring of Data Quality Based on Multiple Data Structures

  • Research Article
  • Cite Count Icon 58
  • 10.1016/j.drugalcdep.2017.12.036
Comparing substance use and mental health outcomes among sexual minority and heterosexual women in probability and non-probability samples
  • Feb 21, 2018
  • Drug and Alcohol Dependence
  • Laurie A Drabble + 5 more

Comparing substance use and mental health outcomes among sexual minority and heterosexual women in probability and non-probability samples

  • Research Article
  • Cite Count Icon 4
  • 10.1080/03610918.2022.2102181
Analysis of combined probability and nonprobability samples: a simulation evaluation and application to a teen smoking behavior survey
  • Jul 14, 2022
  • Communications in Statistics - Simulation and Computation
  • Wenna Xi + 6 more

In scientific studies with low-prevalence outcomes, probability sampling may be supplemented by nonprobability sampling to boost the sample size of desired subpopulation while remaining representative to the entire study population. To utilize both probability and nonprobability samples appropriately, several methods have been proposed in the literature to generate pseudo-weights, including ad-hoc weights, inclusion probability adjusted weights, and propensity score adjusted weights. We empirically compare various weighting strategies via an extensive simulation study, where probability and nonprobability samples are combined. Weight normalization and raking adjustment are also considered. Our simulation results suggest that the unity weight method (with weight normalization) and the inclusion probability adjusted weight method yield very good overall performance. This work is motivated by the Buckeye Teen Health Study, which examines risk factors for the initiation of smoking among teenage males in Ohio. To address the low response rate in the initial probability sample and low prevalence of smokers in the target population, a small convenience sample was collected as a supplement. Our proposed method yields estimates very close to the ones from the analysis using only the probability sample and enjoys the additional benefit of being able to track more teens with risky behaviors through follow-ups.

  • Book Chapter
  • Cite Count Icon 1
  • 10.4324/9781003025245-13
Inference from probability and nonprobability samples
  • Nov 10, 2021
  • Rebecca Andridge + 1 more

In the absence of time and monetary constraints, the ideal way to measure characteristics of a finite population (e.g., residents of the United States) would be to measure every member of the population, in other words, conduct a census. However, this goal is rarely achievable, and researchers must instead rely on measurements from a subset of the population, called a sample. Broadly speaking, there are two methods for obtaining a sample from a population: probability sampling and nonprobability sampling. In a probability sample, each unit in the population has a known, positive (non-zero) probability of being selected into the sample, and randomness, controlled by the designer of the survey, is involved in the selection of which units actually get included in one particular sample. In contrast, in nonprobability sampling the probability that a unit is observed is not in the control of the survey designer; for example, when a sample is based on volunteers. In this chapter, we review the fundamentals of probability sampling, including key design features (e.g., stratification) and traditional methods for inference. We then describe nonprobability samples, which unlike probability sampling cannot neatly fit into a single framework to describe either their design or their resulting inference. We provide examples of nonprobability samples that have worked and that have failed and describe the main problems with nonprobability samples, such as selection bias and nonresponse, comparing with properties of probability samples throughout. The two main approaches to inference for nonprobability samples are described (quasi-randomization, superpopulation modeling), and diagnostics for selection bias are briefly described.

  • Research Article
  • Cite Count Icon 21
  • 10.1158/1055-9965.epi-18-0797
Weighting Nonprobability and Probability Sample Surveys in Describing Cancer Catchment Areas.
  • Mar 1, 2019
  • Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology
  • Ronaldo Iachan + 7 more

The Population Health Assessment initiative by NCI sought to enhance cancer centers' capacity to acquire, aggregate, and integrate data from multiple sources, as well as to plan, coordinate, and enhance catchment area analysis activities. Key objectives of this initiative are pooling data and comparing local data with national data. A novel aspect of analyzing data from this initiative is the methodology used to weight datasets from sites that collected both probability and nonprobability samples. This article describes the methods developed to weight data, which cancer centers collected with combinations of probability, and nonprobability sampling designs. We compare alternative weighting methods in particular for the hybrid probability and nonprobability sampling designs employed by different cancer centers. We also include comparisons of local center data with national survey data from large probability samples. This hybrid approach to calculating statistical weights can be implemented both within cancer centers that collect both probability and nonprobability samples with common measures. Aggregation can also apply to cancer centers that share common data elements, and target similar populations, but differ in survey sampling designs. Researchers interested in local versus national comparisons for cancer surveillance and control outcomes should consider various weighting approaches, including hybrid approaches, when analyzing their data.

  • Research Article
  • Cite Count Icon 31
  • 10.2478/jos-2019-0027
Supplementing Small Probability Samples with Nonprobability Samples: A Bayesian Approach
  • Sep 1, 2019
  • Journal of Official Statistics
  • Joseph W Sakshaug + 3 more

Carefully designed probability-based sample surveys can be prohibitively expensive to conduct. As such, many survey organizations have shifted away from using expensive probability samples in favor of less expensive, but possibly less accurate, nonprobability web samples. However, their lower costs and abundant availability make them a potentially useful supplement to traditional probability-based samples. We examine this notion by proposing a method of supplementing small probability samples with nonprobability samples using Bayesian inference. We consider two semi-conjugate informative prior distributions for linear regression coefficients based on nonprobability samples, one accounting for the distance between maximum likelihood coefficients derived from parallel probability and non-probability samples, and the second depending on the variability and size of the nonprobability sample. The method is evaluated in comparison with a reference prior through simulations and a real-data application involving multiple probability and nonprobability surveys fielded simultaneously using the same questionnaire. We show that the method reduces the variance and mean-squared error (MSE) of coefficient estimates and model-based predictions relative to probability-only samples. Using actual and assumed cost data we also show that the method can yield substantial cost savings (up to 55%) for a fixed MSE.

  • Research Article
  • Cite Count Icon 5
  • 10.1109/jsyst.2020.2985343
Framework for Integral Data Quality and Security Evaluation in Smartphones
  • Jun 1, 2021
  • IEEE Systems Journal
  • Igor Khokhlov + 2 more

Data quality (DQ) concept should play an important role in decision-making and engineering systems. Underestimation of DQ may lead to resource waste, wrong conclusions, or inefficient decisions. Unfortunately, current approaches to DQ incorporating into data management systems are limited to particular applications. This problem is aggravated by the DQ inequality of data sources. This is especially critical in mobile crowd-sensing applications where data may come from unverified data contributors using the smartphones and other mobile devices. To facilitate the expansion of DQ evaluation to a wider spectrum of applications, this article presents a framework for integral DQ and security evaluation in Android-based smartphones. The developed framework provides support for selecting the DQ metrics and implementing their calculus by integrating diverse sensor DQ and security metrics. We present multiple calculi for DQ and security evaluation such as hierarchical fuzzy rules expert system, neural networks, and algebraic functions. Case studies that demonstrate the framework's performance in addressing real-life tasks are presented and the achieved results are analyzed. The implementation results validate the framework's capability of performing comprehensive DQ evaluations.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 94
  • 10.1093/jssam/smz051
Integrating Probability and Nonprobability Samples for Survey Inference
  • Jan 27, 2020
  • Journal of Survey Statistics and Methodology
  • Arkadiusz Wiśniowski + 3 more

Survey data collection costs have risen to a point where many survey researchers and polling companies are abandoning large, expensive probability-based samples in favor of less expensive nonprobability samples. The empirical literature suggests this strategy may be suboptimal for multiple reasons, among them that probability samples tend to outperform nonprobability samples on accuracy when assessed against population benchmarks. However, nonprobability samples are often preferred due to convenience and costs. Instead of forgoing probability sampling entirely, we propose a method of combining both probability and nonprobability samples in a way that exploits their strengths to overcome their weaknesses within a Bayesian inferential framework. By using simulated data, we evaluate supplementing inferences based on small probability samples with prior distributions derived from nonprobability data. We demonstrate that informative priors based on nonprobability data can lead to reductions in variances and mean squared errors for linear model coefficients. The method is also illustrated with actual probability and nonprobability survey data. A discussion of these findings, their implications for survey practice, and possible research extensions are provided in conclusion.

  • Book Chapter
  • Cite Count Icon 6
  • 10.1093/acrefore/9780190264079.013.859
Sampling from Online Panels
  • Jun 18, 2024
  • Oxford Research Encyclopedia of Criminology and Criminal Justice
  • Luzi Shi + 1 more

Since the 2010s, online surveys have become a popular method among criminologists. Often these surveys are conducted with the assistance of private survey research companies, which gather large groups of people (i.e., respondents) who have indicated a willingness to share their opinions on a variety of issues. These panels of potential respondents vary in size and quality. Researchers planning to collect survey data via these online panels must also consider probability versus non-probability sampling methods. Probability samples provide stronger assurances that sample statistics—particularly, univariate point estimates—are generalizable to broader populations (e.g., adult Americans). They are also often very expensive, although this is somewhat dependent on the size and complexity of the proposed project. Two popular providers of probability samples of the American public are the Ipsos Knowledge Panel and the AmeriSpeak Omnibus panel. In criminology and criminal justice, researchers have used online probability panels to study a variety of topics, including behaviors regarding firearms, attitudes toward policing, and experiences of violence. Non-probability samples present a budget-friendly alternative but may be less generalizable to populations of interest. Since 2010, these samples have become especially popular in the criminological literature and are much more commonly used than online probability samples. Findings from non-probability online surveys often yield remarkably similar relational inferences (e.g., correlations) to those obtained from probability samples. However, non-probability samples are generally unsuitable for providing generalizable univariate point estimates. Some of the leading providers of non-probability samples from panels are YouGov, Qualtrics, and Lucid. As of 2024, YouGov uses a matched opt-in sample with a more sophisticated sampling design, while Qualtrics and Lucid provide quota samples. Researchers may also directly recruit non-probability samples of respondents via crowdsourcing platforms, such as Amazon Mechanical Turk, or services that incorporate those platforms into their own business model, such as CloudResearch. Research suggests that platforms with more sophisticated sampling procedures tend to yield more accurate results. Consequently, matched opt-in samples such as YouGov are approximately twice as expensive as Qualtrics samples and are many times more expensive than crowdsourcing platforms. Finally, it should be noted that the demographic composition of online samples, even those that have been simply crowdsourced, tend to be more diverse than typical in-person non-probability samples used in criminology and criminal justice research (e.g., college students).

  • Research Article
  • Cite Count Icon 2
  • 10.1093/jssam/smad032
Using Auxiliary Information in Probability Survey Data to Improve Pseudo-Weighting in Nonprobability Samples: A Copula Model Approach.
  • Sep 12, 2023
  • Journal of survey statistics and methodology
  • Tingyu Zhu + 4 more

While probability sampling has been considered the gold standard of survey methods, nonprobability sampling is increasingly popular due to its convenience and low cost. However, nonprobability samples can lead to biased estimates due to the unknown nature of the underlying selection mechanism. In this article, we propose parametric and semiparametric approaches to integrate probability and nonprobability samples using common ancillary variables observed in both samples. In the parametric approach, the joint distribution of ancillary variables is assumed to follow the latent Gaussian copula model, which is flexible to accommodate both categorical and continuous variables. In contrast, the semiparametric approach requires no assumptions about the distribution of ancillary variables. In addition, logistic regression is used to model the mechanism by which population units enter the nonprobability sample. The unknown parameters in the copula model are estimated through the pseudo maximum likelihood approach. The logistic regression model is estimated by maximizing the sample likelihood constructed from the nonprobability sample. The proposed method is evaluated in the context of estimating the population mean. Our simulation results show that the proposed method is able to correct the selection bias in the nonprobability sample by consistently estimating the underlying inclusion mechanism. By incorporating additional information in the nonprobability sample, the combined method can estimate the population mean more efficiently than using the probability sample alone. A real-data application is provided to illustrate the practical use of the proposed method.

  • Conference Article
  • 10.1145/2184751.2184861
An evaluation of input data quality of lifelog analysis application with a framework based on quantitative index
  • Feb 20, 2012
  • Akika Yamashita + 2 more

In recent years, by the improvement of the data acquisition technology and the development of storage, it has become greatly easier than before to collect lifelog that is to record the person's behavior as digital data. As a result, various lifelog analysis applications have been developed that offer the user profitable information such as person's action histories with an analysis of collected data by sensor terminals, video cameras, and so on.However, in these lifelog analysis applications, the quality of the data that was collected from the sensor terminals and inputted to the application was not discussed in detail. Therefore, in this paper, we have focused on the quality of video image data and the acceleration data of objects. As a representative lifelog analysis application, we have chosen an application which verbalizes person's behavior from the data, and shown the influence of the quality of input data on the execution result of the application by a quantitative index.An evaluation framework is proposed for the discussion of a correlation between input data and execution results of the application. As data processing methods, Bayesian Classifier and HMM are employed in his paper. With various conditions, it has been clarified how the quality of input data affects the result of the lifelog analysis application.

  • Research Article
  • Cite Count Icon 57
  • 10.1027/1614-2241/a000094
Internet Panels, Professional Respondents, and Data Quality
  • Oct 1, 2015
  • Methodology
  • Suzette M Matthijsse + 2 more

Abstract. Most web surveys collect data through nonprobability or opt-in online panels, which are characterized by self-selection. A concern in online research is the emergence of professional respondents, who frequently participate in surveys and are mainly doing so for the incentives. This study investigates if professional respondents can be distinguished in online panels and if they provide lower quality data than nonprofessionals. We analyzed a data set of the NOPVO (Netherlands Online Panel Comparison) study that includes 19 panels, which together capture 90% of the respondents in online market research in the Netherlands. Latent class analysis showed that four types of respondents can be distinguished, ranging from the professional respondent to the altruistic respondent. A profile of professional respondents is depicted. Professional respondents appear not to be a great threat to data quality.

  • Book Chapter
  • Cite Count Icon 138
  • 10.1002/9781118763520.ch10
Professional respondents in nonprobability online panels
  • Apr 11, 2014
  • D Sunshine Hillygus + 2 more

It is well-documented that there exists a pool of frequent survey takers who participate in many different online nonprobability panels in order to earn cash or other incentives--so-called 'professional' respondents. Despite widespread concern about the impact of these professional respondents on data quality, there is not a clear understanding of how they might differ from other respondents. This chapter reviews the previous research and expectations regarding professional respondents and then examines how frequent survey taking and multiple panel participation affects data quality in the 2010 Cooperative Congressional Election Study. In contrast to common assumptions, we do not find overwhelming and consistent evidence that frequent survey takers are more likely to satisfice. On the contrary, frequent survey takers spent more time completing the questionnaire, were less likely to attrite, were less likely to straightline, and reported putting more effort into answering the survey. While panel memberships and number of surveys completed were related to skipping questions, answering "don't know," or giving junk responses to open-ended questions, these relationships did not hold once we account for levels of political knowledge. However, our analysis finds that higher levels of participation in surveys and online panels are associated with lower levels of political knowledge, interest, engagement, and ideological extremism. These findings suggest there could be contrasting motivations for those volunteering to participate in nonprobability panel surveys, with professional respondents taking part for the incentives and nonprofessional respondents taking part based on interest in the survey topic. As such, eliminating professional respondents from survey estimates, as some have recommended, would actually result in a more biased estimate of political outcomes.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant