Using Earnings Data From the Monthly Current Population Survey

Anne Elise Polivka

doi:10.2139/ssrn.261190

Abstract

The monthly Current Population Survey (CPS) has collected earnings on individuals' current jobs since 1979 from a quarter of the sample. This paper discusses three unique aspects of the CPS: (1) the 1994 redesign of the CPS data collection methodology, (2) the complex CPS sample design, and (3) the top coding of CPS earnings data. Failure to account for these unique features of the CPS may result in biased estimates and invalid research conclusions. The first part of the paper discusses the motivation for and the effect of changes implemented with the 1994 redesign. To facilitate comparisons over time, adjustment factors to be applied to mean and median hourly earnings, along with 90th and 10th percentile cutoffs, were estimated. Examination of these adjustment factors indicates that failing to account for the redesign could lead to serious mismeaurement in the patterns and magnitude of earnings inequality over time. A common misconception about the CPS is that it is a simple random sample, when in reality it has a complex multistage sample design. The second and third parts of this paper discuss properly accounting for the complex design of the CPS and obtaining design consistent point estimates and variances through the use of weights and replicate methods such as Balanced Repeated Replication (BRR). As part of the discussion of sample weights, a simple test for whether using weights is appropriate is presented. Throughout the discussion of properly accounting for the complex design of the CPS, the effects are illustrated using a standard OLS log earnings regression model. Comparison of results indicate that not properly accounting for the CPS's sample design can significantly underestimate or even reverse some of the estimated effects of education, being in certain minority groups, occupational rewards, or living in certain regions of the country. The paper concludes with a discussion of the effect of top coding individuals' earnings above a certain amount to a predetermined value. Comparison of estimates generated using original confidential untop coded data and publicly released non-confidential top coded data indicate that even though only a small proportion of earnings are top coded, top coding can lead to statistically significant differences, both with respect to simple arithmetic means and parameter estimates from OLS regression models. However, fitting a Pareto distribution to the tail of the earnings distribution using a modified maximum likelihood was found to provide estimates that usually were quite close to those obtained when no top coding was imposed.

Full Text