Not-so-average after all: Individual vs. aggregate effects in substantive research

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In non-linear models, the effect of a given variable cannot be gauged directly from the associated coefficient. Instead, researchers typically compute the average effect in the population to assess the substantive significance of the variable of interest. Based on the average response, analysts often make policy recommendations that are to be implemented at the individual level (i.e. the unit of analysis level). Such extrapolations, however, can lead to gross generalizations or incorrect inferences. The reason for this is that the mean may obscure a large variation in individual effects, in which case the real-world applicability of the average value is limited. Correctly interpreting the average response may prevent unwarranted extrapolations but does not solve the problem of the lack of practical relevance. Particularly when cases carry special meaning (e.g. countries), the political and socioeconomic relevance of research findings should be assessed at the individual level. This article outlines the conditions under which aggregation to mean is problematic, and advocates a case-centered approach to model evaluation. Specifically, we advise researchers to compute and report the quantity of interest for each case in the data. Only by seeing the full spread of cases can the reader assess how well the average summarizes the population. Our approach allows researchers to draw more meaningful inferences, and makes the connection between research and practical applications more realistic.

Similar Papers
  • Research Article
  • Cite Count Icon 189
  • 10.1038/msb.2008.53
Models from experiments: combinatorial drug perturbations of cancer cells
  • Jan 1, 2008
  • Molecular Systems Biology
  • Sven Nelander + 7 more

We present a novel method for deriving network models from molecular profiles of perturbed cellular systems. The network models aim to predict quantitative outcomes of combinatorial perturbations, such as drug pair treatments or multiple genetic alterations. Mathematically, we represent the system by a set of nodes, representing molecular concentrations or cellular processes, a perturbation vector and an interaction matrix. After perturbation, the system evolves in time according to differential equations with built-in nonlinearity, similar to Hopfield networks, capable of representing epistasis and saturation effects. For a particular set of experiments, we derive the interaction matrix by minimizing a composite error function, aiming at accuracy of prediction and simplicity of network structure. To evaluate the predictive potential of the method, we performed 21 drug pair treatment experiments in a human breast cancer cell line (MCF7) with observation of phospho-proteins and cell cycle markers. The best derived network model rediscovered known interactions and contained interesting predictions. Possible applications include the discovery of regulatory interactions, the design of targeted combination therapies and the engineering of molecular biological networks.

  • Research Article
  • Cite Count Icon 14
  • 10.1111/biom.12730
Model-based bootstrapping when correcting for measurement error with application to logistic regression.
  • May 30, 2017
  • Biometrics
  • John P Buonaccorsi + 2 more

When fitting regression models, measurement error in any of the predictors typically leads to biased coefficients and incorrect inferences. A plethora of methods have been proposed to correct for this. Obtaining standard errors and confidence intervals using the corrected estimators can be challenging and, in addition, there is concern about remaining bias in the corrected estimators. The bootstrap, which is one option to address these problems, has received limited attention in this context. It has usually been employed by simply resampling observations, which, while suitable in some situations, is not always formally justified. In addition, the simple bootstrap does not allow for estimating bias in non-linear models, including logistic regression. Model-based bootstrapping, which can potentially estimate bias in addition to being robust to the original sampling or whether the measurement error variance is constant or not, has received limited attention. However, it faces challenges that are not present in handling regression models with no measurement error. This article develops new methods for model-based bootstrapping when correcting for measurement error in logistic regression with replicate measures. The methodology is illustrated using two examples, and a series of simulations are carried out to assess and compare the simple and model-based bootstrap methods, as well as other standard methods. While not always perfect, the model-based approaches offer some distinct improvements over the other methods.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.ecosta.2020.03.008
On temporal aggregation of some nonlinear time-series models
  • May 19, 2020
  • Econometrics and Statistics
  • Wai-Sum Chan

On temporal aggregation of some nonlinear time-series models

  • Research Article
  • Cite Count Icon 10
  • 10.1088/1752-7163/accf31
Review of linear and nonlinear models in breath analysis by Cyranose 320
  • May 26, 2023
  • Journal of Breath Research
  • Maryan Arrieta + 3 more

Analysis of volatile organic compounds (VOCs) in breath specimens has potential for point of care (POC) screening due to ease of sample collection. While the electronic nose (e-nose) is a standard VOC measure across a wide range of industries, it has not been adopted for POC screening in healthcare. One limitation of the e-nose is the absence of mathematical models of data analysis that yield easily interpreted findings at POC. The purposes of this review were to (1) examine the sensitivity/specificity results from studies that analyzed breath smellprints using the Cyranose 320, a widely used commercial e-nose, and (2) determine whether linear or nonlinear mathematical models are superior for analyzing Cyranose 320 breath smellprints. This systematic review was conducted according to the guidelines of the Preferred Reporting Items for Systematic Review and Meta-Analyses using keywords related to e-nose and breath. Twenty-two articles met the eligibility criteria. Two studies used a linear model while the rest used nonlinear models. The two studies that used a linear model had a smaller range for mean of sensitivity and higher mean (71.0%–96.0%; M = 83.5%) compared to the studies that used nonlinear models (46.9%–100%; M = 77.0%). Additionally, studies that used linear models had a smaller range for mean of specificity and higher mean (83.0%–91.5%; M = 87.2%) compared to studies that used nonlinear models (56.9%–94.0%; M = 76.9%). Linear models achieved smaller ranges for means of sensitivity and specificity compared to nonlinear models supporting additional investigations of their use for POC testing. Because our findings were derived from studies of heterogenous medical conditions, it is not known if they generalize to specific diagnoses.

  • Research Article
  • Cite Count Icon 39
  • 10.1162/rest.90.3.406
Arbitrarily Normalized Coefficients, Information Sets, and False Reports of “Biases” in Binary Outcome Models
  • Aug 1, 2008
  • Review of Economics and Statistics
  • Thomas A Mroz + 1 more

Empirical researchers sometimes misinterpret how additional regressors, heterogeneity corrections, and multilevel factors impact the interpretation of the estimated parameters in binary outcome models such as logit and probit. This can result in incorrect inferences about the importance of incorporating such features in these nonlinear statistical models. Some reports of biases in binary outcome models appear related to the arbitrary variance normalization required in binary outcome models. A focus on readily interpretable numerical quantities, rather than conveniently chosen “effects” as measured by arbitrarily scaled coefficients, would eliminate nearly all of the interpretation problems we highlight in this paper.

  • Single Report
  • Cite Count Icon 18
  • 10.3386/w14086
Use of Propensity Scores in Non-Linear Response Models: The Case for Health Care Expenditures
  • Jun 1, 2008
  • Anirban Basu + 2 more

Under the assumption of no unmeasured confounders, a large literature exists on methods that can be used to estimating average treatment effects (ATE) from observational data and that spans regression models, propensity score adjustments using stratification, weighting or regression and even the combination of both as in doubly-robust estimators. However, comparison of these alternative methods is sparse in the context of data generated via non-linear models where treatment effects are heterogeneous, such as is in the case of healthcare cost data. In this paper, we compare the performance of alternative regression and propensity score-based estimators in estimating average treatment effects on outcomes that are generated via non-linear models. Using simulations, we find that in moderate size samples (n= 5000), balancing on estimated propensity scores balances the covariate means across treatment arms but fails to balance higher-order moments and covariances amongst covariates, raising concern about its use in non-linear outcomes generating mechanisms. We also find that besides inverse-probability weighting (IPW) with propensity scores, no one estimator is consistent under all data generating mechanisms. The IPW estimator is itself prone to inconsistency due to misspecification of the model for estimating propensity scores. Even when it is consistent, the IPW estimator is usually extremely inefficient. Thus care should be taken before naively applying any one estimator to estimate ATE in these data. We develop a recommendation for an algorithm which may help applied researchers to arrive at the optimal estimator. We illustrate the application of this algorithm and also the performance of alternative methods in a cost dataset on breast cancer treatment.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 22
  • 10.1080/09585176.2010.504573
On the relevance of the mathematics curriculum to young people
  • Sep 1, 2010
  • The Curriculum Journal
  • Paula Sealey + 1 more

In this article we draw upon focus group data from a large study of learner trajectories through 14–19 mathematics education to think about the notion of relevance in the mathematics curriculum. Drawing on data from three socially distanced sites we explore how different emphases on what might be termed practical, process and/or professional forms of relevance affect the experiences and aspirations of learners of mathematics. We consider whether an emphasis on practical relevance in schools serving relatively disadvantaged communities might aid the reproduction of students' social position. This leads us to suggest that a fourth category of curriculum relevance – political relevance – is largely missing from classrooms.

  • Research Article
  • Cite Count Icon 25
  • 10.1002/jae.3950070502
Nonlinear dynamics and econometrics: AN introduction
  • Dec 1, 1992
  • Journal of Applied Econometrics
  • M Hashem Pesaran + 1 more

The empirical modelling of economic time series is dominated by methods that assume linearity of the underlying dynamic economic system, the so-called Frisch-Slutsky paradigm. The main a priori argument in favour of linearity and the reason for its original adoption is its simplicity: autoregressive models can be estimated using standard regression packages and there is now a wide range of computer software packages available to estimate models with linear moving average components; the dynamics contained in estimated models can be completely characterized by their impulse response functions and directly related to linear models of the macroeconomy. The dominance of the Frisch-Slutsky paradigm is not based on any strong a priori belief that the economic system is linear. Indeed, once curvature is introduced into utility functions and/or production functions, nonlinearity is pervasive in theoretical dynamic models, hence the shortage of 'closed-form' solutions. Nor is it based on any convincing empirical evidence that actual economic time series are best described as linear stochastic processes. The LQ optimization approach (namely Linear constraints and Quadratic objective functions) which underlies most econometric applications (either implicitly or explicitly) is undoubtedly attractive on analytical and computational grounds, but can be highly deficient in areas where economic behaviour is dominated by asymmetric costs of adjustments, irreversibilities, transaction costs or institutional and physical rigidities. The challenge facing applied econometrics in dealing with these issues is truly immense. Reliable statistical techniques are required for detecting dynamic nonlinearities. Stochastic optimization models that are capable of capturing the primary sources of dynamic nonlinearities and are suitable for empirical analysis need to be developed. The possible effects of temporal aggregation, and aggregation across commodities and agents, need to be worked out in the context of nonlinear dynamic models. These are complicated issues, solutions to some of which have eluded us even in the case of linear models, and promise to be far more complicated when we enter the realm of nonlinear dynamic models. It is therefore natural to expect that real progress in the area of nonlinear dynamic econometric models will be slow and hard to come by. Nevertheless, the past two decades have witnessed important advances in the mathematical and statistical analysis of dynamic systems, particularly in physics, epidemiology and meteorology. Many of these developments have already found their way into economics and

  • Report Series
  • Cite Count Icon 22
  • 10.1920/wp.cem.2002.1802
Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity
  • Jun 13, 2002
  • Jeffrey M Wooldridge

I study a simple, widely applicable approach to handling the initial conditions problem in dynamic, nonlinear unobserved effects models. Rather than attempting to obtain the joint distribution of all outcomes of the endogenous variables, I propose finding the distribution conditional on the initial value (and the observed history of strictly exogenous explanatory variables). The approach is flexible, and results in simple estimation strategies for at least three leading dynamic, nonlinear models: probit, Tobit, and Poisson regression. I treat the general problem of estimating average partial effects, and show that simple estimators exist for important special cases.

  • Research Article
  • Cite Count Icon 1648
  • 10.1002/jae.770
Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity
  • Jan 1, 2005
  • Journal of Applied Econometrics
  • Jeffrey M Wooldridge

I study a simple, widely applicable approach to handling the initial conditions problem in dynamic, nonlinear unobserved effects models. Rather than attempting to obtain the joint distribution of all outcomes of the endogenous variables, I propose finding the distribution conditional on the initial value (and the observed history of strictly exogenous explanatory variables). The approach is flexible, and results in simple estimation strategies for at least three leading dynamic, nonlinear models: probit, Tobit and Poisson regression. I treat the general problem of estimating average partial effects, and show that simple estimators exist for important special cases. Copyright © 2005 John Wiley & Sons, Ltd.

  • Research Article
  • Cite Count Icon 846
  • 10.1177/1536867x1101100306
Comparing Coefficients of Nested Nonlinear Probability Models
  • Oct 1, 2011
  • The Stata Journal: Promoting communications on statistics and Stata
  • Ulrich Kohler + 2 more

In a series of recent articles, Karlson, Holm, and Breen (Breen, Karlson, and Holm, 2011, http://papers.ssrn.com/sol3/papers.cfm?abstractid=1730065 ; Karlson and Holm, 2011, Research in Stratification and Social Mobility 29: 221– 237; Karlson, Holm, and Breen, 2010, http://www.yale.edu/ciqle/Breen Scaling %20effects.pdf) have developed a method for comparing the estimated coefficients of two nested nonlinear probability models. In this article, we describe this method and the user-written program khb, which implements the method. The KHB method is a general decomposition method that is unaffected by the rescaling or attenuation bias that arises in cross-model comparisons in nonlinear models. It recovers the degree to which a control variable, Z, mediates or explains the relationship between X and a latent outcome variable, Y*, underlying the nonlinear probability model. It also decomposes effects of both discrete and continuous variables, applies to average partial effects, and provides analytically derived statistical tests. The method can be extended to other models in the generalized linear model family.

  • Research Article
  • Cite Count Icon 40
  • 10.1113/jphysiol.2011.225987
Requirement of neuronal connexin36 in pathways mediating presynaptic inhibition of primary afferents in functionally mature mouse spinal cord
  • Jul 27, 2012
  • The Journal of Physiology
  • Wendy Bautista + 3 more

Electrical synapses formed by gap junctions containing connexin36 (Cx36) promote synchronous activity of interneurones in many regions of mammalian brain; however, there is limited information on the role of electrical synapses in spinal neuronal networks. Here we show that Cx36 is widely distributed in the spinal cord and is involved in mechanisms that govern presynaptic inhibition of primary afferent terminals. Electrophysiological recordings were made in spinal cord preparations from 8- to 11-day-old wild-type and Cx36 knockout mice. Several features associated with presynaptic inhibition evoked by conditioning stimulation of low threshold hindlimb afferents were substantially compromised in Cx36 knockout mice. Dorsal root potentials (DRPs) evoked by low intensity stimulation of sensory afferents were reduced in amplitude by 79% and in duration by 67% in Cx36 knockouts. DRPs were similarly affected in wild-types by bath application of gap junction blockers. Consistent with presynaptic inhibition of group Ia muscle spindle afferent terminals on motoneurones described in adult cats, conditioning stimulation of an adjacent dorsal root evoked a long duration inhibition of monosynaptic reflexes recorded from the ventral root in wild-type mice, and this inhibition was antagonized by bicuculline. The same conditioning stimulation failed to inhibit monosynaptic reflexes in Cx36 knockout mice. Immunofluorescence labelling for Cx36 was found throughout the dorsal and ventral horns of the spinal cord of juvenile mice and persisted in mature animals. In deep dorsal horn laminae, where interneurones involved in presynaptic inhibition of large diameter muscle afferents are located, cells were extensively dye-coupled following intracellular neurobiotin injection. Coupled cells displayed Cx36-positive puncta along their processes. Our results indicate that gap junctions formed by Cx36 in spinal cord are required for maintenance of presynaptic inhibition, including the regulation of transmission from Ia muscle spindle afferents. In addition to a role in presynaptic inhibition in juvenile animals, the persistence of Cx36 expression among spinal neuronal populations in the adult mouse suggests that the contribution of electrical synapses to integrative processes in fully mature spinal cord may be as diverse as that found in other areas of the CNS.

  • Research Article
  • Cite Count Icon 39
  • 10.1007/s002850050184
Stochastic host-parasite interaction models.
  • Apr 20, 2000
  • Journal of Mathematical Biology
  • Julian Herbert + 1 more

We contribute to the discussion of causes and effects of aggregation (overdispersion) of macroparasite counts, focussing particularly upon the effects of clumped infections and parasite-induced host mortality. The simple nonlinear stochastic model for the evolution of the parasite load of a single host, investigated in Isham (1995), is extended to allow three parasite stages (larval, mature and offspring), and to allow durations of these stages to be non-exponentially distributed. As in the earlier work, exact algebraic results are possible, providing insight into the aggregation mechanisms, as long as the only source of interaction between host and parasites is an excess host mortality linearly related to the parasite load. Results are obtained on the distribution of parasite load and on host survival. In particular, although parasite-induced host mortality is usually thought of as a process that reduces parasite aggregation (Anderson and Gordon 1982), it is shown that, for this model, parasite-induced host mortality cannot cause the index of dispersion to fall below unity. Host heterogeneity and disease control are also discussed. An approximation based on moment assumptions appropriate to a specially-constructed multivariate negative binomial distribution is proposed. This approximation, which is applicable to other processes, and an alternative based on the multivariate normal distribution are compared with exact results.

  • Research Article
  • Cite Count Icon 175
  • 10.1162/glep_a_00294
Building Productive Links between the UNFCCC and the Broader Global Climate Governance Landscape1. This article reflects and builds upon discussions at a December 2013 workshop held in Neemrana, India, sponsored by the Centre for Policy Research (New Delhi) and the Mitigation Action Plans and Scenarios (MAPS) program of the Energy Research Centre (Cape Town). This article builds on pp. 14–19
  • May 1, 2015
  • Global Environmental Politics
  • Michele Betsill + 5 more

This forum article outlines a research agenda focused on linkages between the UNFCCC and other governance arrangements that also address climate change. We take as our point of departure the recognition that the UNFCCC is no longer the sole site of global climate change governance, and thus the types of linkage across what we call the global climate governance landscape including as a central node the UNFCCC are important for thinking through how improved global responses to climate change may be pursued. The forum identifies two specific types of linkage: division-of-labor linkages and catalytic linkages. We illustrate these with some examples and raise questions we believe would be useful to pursue in future research.

  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.tra.2010.12.005
Estimating multimodal transit ridership with a varying fare structure
  • Jan 5, 2011
  • Transportation Research Part A: Policy and Practice
  • Konstantina Gkritza + 2 more

Estimating multimodal transit ridership with a varying fare structure

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.