SamplingStrata: AnRPackage for the Optimization of Stratified Sampling

Giulio Barcaroli

doi:10.18637/jss.v061.i04

Abstract

<span>When designing a sampling survey, usually constraints are set on the desired precision levels regarding one or more target estimates (the Ys). If a sampling frame is available, containing auxiliary information related to each unit (the Xs), it is possible to adopt a stratified sample design. For any given stratification of the frame, in the multivariate case it is possible to solve the problem of the best allocation of units in strata, by minimizing a cost function sub ject to precision constraints (or, conversely, by maximizing the precision of the estimates under a given budget). The problem is to determine the best stratification in the frame, i.e., the one that ensures the overall minimal cost of the sample necessary to satisfy precision constraints. The Xs can be categorical or continuous; continuous ones can be transformed into categorical ones. The most detailed stratification is given by the Cartesian product of the Xs (the atomic strata). A way to determine the best stratification is to explore exhaustively the set of all possible partitions derivable by the set of atomic strata, evaluating each one by calculating the corresponding cost in terms of the sample required to satisfy precision constraints. This is unaffordable in practical situations, where the dimension of the space of the partitions can be very high. Another possible way is to explore the space of partitions with an algorithm that is particularly suitable in such situations: the genetic algorithm. The R package SamplingStrata, based on the use of a genetic algorithm, allows to determine the best stratification for a population frame, i.e., the one that ensures the minimum sample cost necessary to satisfy precision constraints, in a multivariate and multi-domain case.</span>

Highlights

Let us suppose we need to design a sample survey, having available a complete frame contain-SamplingStrata: Optimization of Stratified Sampling in R ing information on the target population
If our sample design is a stratified one, we need to choose how to form strata in the population, in order to get the maximum advantage of the available auxiliary information
The estimates related to the Y ’s are calculated. Their means and standard deviations are computed, in order to produce the coefficient of variation (CV) related to each variable in every domain

Summary

Introduction

Let us suppose we need to design a sample survey, having available a complete frame contain-SamplingStrata: Optimization of Stratified Sampling in R ing information on the target population (identifiers plus auxiliary information). If our sample design is a stratified one, we need to choose how to form strata in the population, in order to get the maximum advantage of the available auxiliary information. By best stratification, we mean the stratification that ensures the minimum cost of the sample, necessary to satisfy precision constraints, set on the estimates of the target variables Y (constraints expressed as maximum expected coefficients of variation in different domains of interest). The number of possible alternative stratifications for a given population frame may be very high, in some cases even innumerable. In these cases it is not possible to enumerate them in order to find the best stratification. The implementation of the genetic algorithm in the package SamplingStrata (Barcaroli, Pagliuca, and Willighagen 2014) makes use of a modified version of the functions available in the genalg package (Willighagen 2014)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Statistical Software	Publication Date: Jan 1, 2014
Citations: 21	License type: cc-by

R Discovery Prime

R Discovery Prime

SamplingStrata: AnRPackage for the Optimization of Stratified Sampling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software

Lead the way for us

Similar Papers

Note on stratified two-stage sampling under precision constraints
Marcin Kozak
Model Assisted Statistics and Applications | VOL. 1
Marcin KozakMarcin Kozak
02 Nov 2006
Model Assisted Statistics and Applications | VOL. 1

Practical data-oriented microaggregation for statistical disclosure control
J Domingo-Ferrer ... J.M Mateo-Sanz
IEEE Transactions on Knowledge and Data Engineering | VOL. 14
J Domingo-Ferrer, et. al.J Domingo-Ferrer ... J.M Mateo-Sanz
01 Jan 2002
IEEE Transactions on Knowledge and Data Engineering | VOL. 14

An algorithm for data-driven bandwidth selection
D Comaniciu
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 25
D ComaniciuD Comaniciu
01 Feb 2003
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 25

Optimal Stratification of Sampling Frames in a Multivariate and Multidomain Sample Design
Marco Ballin ... Giulio Barcaroli
SSRN Electronic Journal | VOL. -
Marco Ballin, et. al.Marco Ballin ... Giulio Barcaroli
12 Jun 2010
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SamplingStrata: AnRPackage for the Optimization of Stratified Sampling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Statistical Software