Constrained Inference When the Sampled and Target Populations Differ

Huijun Yi,Bhaskar Bhattacharya

doi:10.3390/e18030097

Huijun Yi, Bhaskar Bhattacharya

Open Access

https://doi.org/10.3390/e18030097

Copy DOI

Journal: Entropy	Publication Date: Mar 16, 2016
Citations: 5	License type: CC BY 4.0

Affiliation: Troy University, Southern Illinois University Carbondale

Abstract

In the analysis of contingency tables, often one faces two difficult criteria: sampled and target populations are not identical and prior information translates to the presence of general linear inequality restrictions. Under these situations, we present new models of estimating cell probabilities related to four well-known methods of estimation. We prove that each model yields maximum likelihood estimators under those restrictions. The performance ranking of these methods under equality restrictions is known. We compare these methods under inequality restrictions in a simulation study. It reveals that these methods may rank differently under inequality restriction than with equality. These four methods are also compared while US census data are analyzed.

Highlights

When working with a sample contingency table, a researcher might need to adjust it based on information available from other sources
Often it comes as marginal information such as row and/or column totals
Consider a data set where each subject is cross-classified by income and urbanity, and, marginal information about income and urbanity is available from a census

Summary

Introduction

When working with a sample contingency table, a researcher might need to adjust it based on information available from other sources. For two-way contingency tables of size (I × J), four well-known [1,2] margin-adjusting methods for estimating cell probabilities are raking (RAKE), least squares (LSQ), minimum chi-squared (MCSQ) and maximum likelihood under random sampling (MLRS). Often sample units are too expensive to locate or unwilling to participate in the survey In this case, to estimate the target cell probabilities, we have to take a random sample from a sampled population that is systematically different from the target population. A similar problem in a regression context can be found in [4] It is well-known that all four margin-adjusting methods are asymptotically equivalent under simple random sampling. Their small sample results can be different. For simulation (Section 4), we have restricted our attention to (2 × 2) tables to facilitate comparison with Little and Wu [3]

Solutions from Each Method

Models Relating the Sampled and Target Populations

A Simulation Study

Design

Applying Four Methods to Real World Data

Conclusions