Abstract

Case selection is ubiquitous in public management research. Rarely do scholars have access to entire populations of interest. Yet, the manner by which scholars select samples to conduct their analyses can have profound consequences on their ability both to draw valid causal inferences and to estimate accurate relationships. In this article, we review the basic threats to inference that are likely to emerge in the presence of non-random case selection, with specific attention to their manifestation in empirical public management research. The article first reviews the threats to causal inference presented by case selection, focusing on their implications for internal and external validity. We then summarize a standard set of solutions to address potential problems for empirical models caused by non-random case selection. As part of this discussion, we review recent articles published in this journal to illustrate the prevalence of selection issues in contemporary public management studies, and then illustrate several techniques that have been developed to overcome specific problems to show their utility for public management research. Causal inference is a central goal of social science. Scholars often conduct their research with the primary interest of understanding whether and to what extent a variable of interest influences some outcome. Given this central role, the pitfalls of drawing valid causal inferences have garnered much attention. Scholars highlight two standards with which a causal inference can be evaluated: internal and external validity. The former reflects the analyst’s ability to draw valid inferences about the relevant causal relationship under study, whereas the latter reflects the extent to which valid inferences (based on a given sample of cases) may extend to the population or to cases not part of the original analysis. Although a variety of research design features can plague both internal and external validity, this article’s focus is on the role of case selection, by which we mean either the explicit or implicit inclusion of a subset of cases (e.g., people, organizations, jurisdictions) from a larger population, in a study seeking to make causal claims. We would like to thank Ryan Welch for his research assistance in processing and coding JPART articles for selection issues, and Stephane Lavertu and David Weimer for sharing their data. We would also like to thank Ken Meier and Andy Whitford for their valuable suggestions, as well as the comments from the three anonymous reviewers. Address correspondence to the author at dmk74@georgetown.edu. Konisky and Reenock JPART 23:361–393 doi:10.1093/jopart/mus051 Advance Access publication November 15, 2012 © The Author 2012. Published by Oxford University Press on behalf of the Journal of Public Administration Research and Theory, Inc. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com at Frida State U niersity on A ril 1, 2013 http://jpaordjournals.org/ D ow nladed from Journal of Public Administration Research and theory 362 Whether in the study of personnel management, public budgeting, or policy analysis, public administration scholars generally do not possess the luxury of having access to populations of interest. Finite resources as well as practical limitations constrain most scholars to conducting analyses on samples of populations—samples that may include non-randomly included cases. When using such data, potential threats to both internal and external validity arise. Our purpose here is neither to present a comprehensive overview of research design nor of the case selection literature. Rather, our goal is to review the main threats to valid causal inference presented by non-random case selection, with particular emphasis on empirical research in public management (broadly defined) that is quantitative in orientation. We deem this choice appropriate given the trends in public management scholarship in this and other leading journals.1 We orient our theoretical discussion in the counterfactual causal inference model and apply most of our empirical treatments within the regression framework, but it is important to note that, at a conceptual level, similar issues arise in qualitative research as well (Collier and Mahoney 1996; King, Keohane, and Verba 1994). The article proceeds as follows. In the first two sections, we review the main threats to causal inference presented by case selection, focusing on their implications for internal and external validity. We also summarize a standard set of solutions to address potential problems caused by non-random case selection. We subsequently report the results of a review of recently published JPART articles to examine the prevalence of case selection issues. We then turn to detailed illustrations of several particular methods to address non-random case selection. Although these solutions are not novel, they tend to be underutilized in public management research, and one of the objectives of this article is to make the methods more accessible to scholars by illustrating their utility. We conclude the article with a few general suggestions. CASe SeleCtion, CAuSAl infeRenCe, And thReAtS to inteRnAl VAlidity There is perhaps no better place to begin an examination of the conditions under which case selection may threaten the internal validity of a study than the counterfactual causal inference model of the classic experiment. In the ideal experimental setting, the analyst has a theoretically interesting independent variable, D, whose presence is believed to causally alter the value of some outcome of interest, Y. To test this possibility, the analyst identifies subjects who are randomly assigned to one of two groups, treatment D(1) or control D(0). The treatment is then applied to subjects in D(1) and not to those in D(0), and each subject’s response on the outcome variable of interest, Y, is recorded.2 At the individual level, subjects in the treatment group realize a value of d1 and subjects in the control group realize a value of d0. The observable outcome variable, Y, is a function of two potential outcome variables, Y1 and Y0, such that Y = DY1 + (1−D)Y0. This means that each subject in the treatment group has an 1 More specifically, we limit our focus to empirical work with the explicit purpose of theory or hypothesis testing and do not consider case selection issues in the context of theory building. 2 To be clear, the “treatment” in most social science analysis and, in this case, public management research is nothing more than the main independent variable under investigation. This “treatment” can be dichotomous, ordinal, or continuous in measurement. at Frida State U niersity on A ril 1, 2013 http://jpaordjournals.org/ D ow nladed from Konisky and Reenock Case Selection in Public Management Research 363 observable outcome in the treatment, Yi, and an unobservable counterfactual in the control, Yi. The same is assumed for each subject in the control group; each subject has an observable outcome in the control and an unobservable counterfactual in the treatment. At the individual level then, the causal effect of the treatment would be δi = yi−yi. Unfortunately, we cannot observe the outcome at the individual level for the counterfactual case—a feature referred to as the fundamental problem of causal inference (Holland 1986; King, Keohane, and Verba 1994; Morgan and Winship 2007). This problem is akin to one of missing data (Winship and Morgan 1999), since each subject can only be assigned to either the treatment di = 1 or the control di = 0. Accordingly, for a given subject, we can only ever observe either yi or yi, but never both. For this reason, analysts must focus on aggregate-level causal effects. The key insight of this causal inference model is that the counterfactual outcome, which is essential to estimates of causal inference, cannot be observed directly; accordingly, various research methods must be used to approximate it. (See Hidalgo and Sekhon (2011) for a discussion of causality in this counterfactual framework.) Table 1 displays the aggregate quantities of interest for each possible distribution of the potential outcome variables. The table shows each of the two potential outcome variables Y1 and Y0 across the columns and the treatment and control grouping by the rows. For those assigned to the treatment group, we have two outcomes: the observed outcome of treatment and the unobserved outcome for those in treatment had they been assigned to the control group. For those assigned to the control group, we have two outcomes: the observed outcome of control and the unobserved outcome for those in control had they been assigned to treatment. The most relevant quantity of interest is the average treatment (causal) effect (ATE), which represents the difference between the average outcome for the treatment group in the sample and the average outcome for the control group in the sample.3 More formally, the estimated ATE is D E E Y D E Y D = = = − = [ ] [ | ] [ | ] δ 1 0 1 0 1. In addition to the ATE, there are two conditional treatment effects that are often of interest, both of which are displayed in Table 1. The average treatment effect of the table 1 The Fundamental Problem of Causal Inference Y1 − Treatment outcomes Y0 − Control outcomes Treatment group E[Y1|D=1] Observable outcome ATT E[Y0|D=1] Potential outcome

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call