Abstract
<span>When designing a sampling survey, usually constraints are set on the desired precision levels regarding one or more target estimates (the Ys). If a sampling frame is available, containing auxiliary information related to each unit (the Xs), it is possible to adopt a stratified sample design. For any given stratification of the frame, in the multivariate case it is possible to solve the problem of the best allocation of units in strata, by minimizing a cost function sub ject to precision constraints (or, conversely, by maximizing the precision of the estimates under a given budget). The problem is to determine the best stratification in the frame, i.e., the one that ensures the overall minimal cost of the sample necessary to satisfy precision constraints. The Xs can be categorical or continuous; continuous ones can be transformed into categorical ones. The most detailed stratification is given by the Cartesian product of the Xs (the atomic strata). A way to determine the best stratification is to explore exhaustively the set of all possible partitions derivable by the set of atomic strata, evaluating each one by calculating the corresponding cost in terms of the sample required to satisfy precision constraints. This is unaffordable in practical situations, where the dimension of the space of the partitions can be very high. Another possible way is to explore the space of partitions with an algorithm that is particularly suitable in such situations: the genetic algorithm. The R package SamplingStrata, based on the use of a genetic algorithm, allows to determine the best stratification for a population frame, i.e., the one that ensures the minimum sample cost necessary to satisfy precision constraints, in a multivariate and multi-domain case.</span>
Highlights
Let us suppose we need to design a sample survey, having available a complete frame contain-SamplingStrata: Optimization of Stratified Sampling in R ing information on the target population
If our sample design is a stratified one, we need to choose how to form strata in the population, in order to get the maximum advantage of the available auxiliary information
The estimates related to the Y ’s are calculated. Their means and standard deviations are computed, in order to produce the coefficient of variation (CV) related to each variable in every domain
Summary
Let us suppose we need to design a sample survey, having available a complete frame contain-SamplingStrata: Optimization of Stratified Sampling in R ing information on the target population (identifiers plus auxiliary information). If our sample design is a stratified one, we need to choose how to form strata in the population, in order to get the maximum advantage of the available auxiliary information. By best stratification, we mean the stratification that ensures the minimum cost of the sample, necessary to satisfy precision constraints, set on the estimates of the target variables Y (constraints expressed as maximum expected coefficients of variation in different domains of interest). The number of possible alternative stratifications for a given population frame may be very high, in some cases even innumerable. In these cases it is not possible to enumerate them in order to find the best stratification. The implementation of the genetic algorithm in the package SamplingStrata (Barcaroli, Pagliuca, and Willighagen 2014) makes use of a modified version of the functions available in the genalg package (Willighagen 2014)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.