Published in last 50 years
Articles published on Biased Coefficient Estimates
- Research Article
55
- 10.1111/2041-210x.12322
- Dec 17, 2014
- Methods in Ecology and Evolution
- E Hance Ellington + 7 more
Summary There is a growing need for scientific synthesis in ecology and evolution. In many cases, meta‐analytic techniques can be used to complement such synthesis. However, missing data are a serious problem for any synthetic efforts and can compromise the integrity of meta‐analyses in these and other disciplines. Currently, the prevalence of missing data in meta‐analytic data sets in ecology and the efficacy of different remedies for this problem have not been adequately quantified. We generated meta‐analytic data sets based on literature reviews of experimental and observational data and found that missing data were prevalent in meta‐analytic ecological data sets. We then tested the performance of complete case removal (a widely used method when data are missing) and multiple imputation (an alternative method for data recovery) and assessed model bias, precision and multimodel rankings under a variety of simulated conditions using published meta‐regression data sets. We found that complete case removal led to biased and imprecise coefficient estimates and yielded poorly specified models. In contrast, multiple imputation provided unbiased parameter estimates with only a small loss in precision. The performance of multiple imputation, however, was dependent on the type of data missing. It performed best when missing values were weighting variables, but performance was mixed when missing values were predictor variables. Multiple imputation performed poorly when imputing raw data which were then used to calculate effect size and the weighting variable. We conclude that complete case removal should not be used in meta‐regression and that multiple imputation has the potential to be an indispensable tool for meta‐regression in ecology and evolution. However, we recommend that users assess the performance of multiple imputation by simulating missing data on a subset of their data before implementing it to recover actual missing data.
- Research Article
45
- 10.1016/j.jeconom.2013.08.002
- Aug 11, 2013
- Journal of Econometrics
- Zongwu Cai + 1 more
Testing predictive regression models with nonstationary regressors
- Research Article
3
- 10.7465/jkdi.2013.24.3.651
- May 31, 2013
- Journal of the Korean Data and Information Science Society
- Gyo-Young Cho + 1 more
For the marginal model and generalized estimating equations (GEE) method there is important full covariates conditional mean (FCCM) assumption which is pointed out by Pepe and Anderson (1994). With longitudinal data with time-varying stochastic covariates, this assumption may not necessarily hold. If this assumption is violated, the biased estimates of regression coefficients may result. But if a diagonal working correlation matrix is used, irrespective of whether the assumption is violated, the resulting estimates are (nearly) unbiased (Pan et al., 2000).The quadratic inference functions (QIF) method proposed by Qu et al. (2000) is the method based on generalized method of moment (GMM) using GEE. The QIF yields a substantial improvement in efficiency for the estimator of <TEX>${\beta}$</TEX> when the working correlation is misspecified, and equal efficiency to the GEE when the working correlation is correct (Qu et al., 2000).In this paper, we interest in whether the QIF can improve the results of the GEE method in the case of FCCM is violated. We show that the QIF with exchangeable and AR(1) working correlation matrix cannot be consistent and asymptotically normal in this case. Also it may not be efficient than GEE with independence working correlation. Our simulation studies verify the result.
- Research Article
5
- 10.1111/j.1541-0420.2012.01833.x
- Feb 4, 2013
- Biometrics
- David M Zucker + 4 more
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer.
- Research Article
27
- 10.1371/journal.pone.0047705
- Oct 15, 2012
- PLoS ONE
- Jian Wang + 7 more
A mediation model explores the direct and indirect effects between an independent variable and a dependent variable by including other variables (or mediators). Mediation analysis has recently been used to dissect the direct and indirect effects of genetic variants on complex diseases using case-control studies. However, bias could arise in the estimations of the genetic variant-mediator association because the presence or absence of the mediator in the study samples is not sampled following the principles of case-control study design. In this case, the mediation analysis using data from case-control studies might lead to biased estimates of coefficients and indirect effects. In this article, we investigated a multiple-mediation model involving a three-path mediating effect through two mediators using case-control study data. We propose an approach to correct bias in coefficients and provide accurate estimates of the specific indirect effects. Our approach can also be used when the original case-control study is frequency matched on one of the mediators. We employed bootstrapping to assess the significance of indirect effects. We conducted simulation studies to investigate the performance of the proposed approach, and showed that it provides more accurate estimates of the indirect effects as well as the percent mediated than standard regressions. We then applied this approach to study the mediating effects of both smoking and chronic obstructive pulmonary disease (COPD) on the association between the CHRNA5-A3 gene locus and lung cancer risk using data from a lung cancer case-control study. The results showed that the genetic variant influences lung cancer risk indirectly through all three different pathways. The percent of genetic association mediated was 18.3% through smoking alone, 30.2% through COPD alone, and 20.6% through the path including both smoking and COPD, and the total genetic variant-lung cancer association explained by the two mediators was 69.1%.
- Research Article
16
- 10.1080/02664763.2012.709227
- Oct 1, 2012
- Journal of Applied Statistics
- Taeyoung Park + 2 more
A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the non-constant pattern of a log baseline rate is modeled with a non-parametric step function, the resulting semi-parametric model involves a model component of varying dimensions and thus requires a sophisticated varying-dimensional inference to obtain the correct estimates of model parameters of a fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art Markov chain Monte Carlo-type algorithm based on partial collapse. The proposed model and methods are used to investigate the association between the daily homicide rates in Cali, Colombia, and the policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.
- Research Article
8
- 10.1093/jjfinec/nbs011
- Sep 11, 2012
- Journal of Financial Econometrics
- M. Zhu
One of the fundamental econometric models in finance is predictive regression. The standard least squares method produces biased coefficient estimates when the regressor is persistent and its innovations are correlated with those of the dependent variable. This article proposes a general and convenient method based on the jackknife technique to tackle the estimation problem. The proposed method reduces the bias for both single- and multiple-regressor models and for both short- and long-horizon regressions. The effectiveness of the proposed method is demonstrated by simulations. An empirical application to equity premium prediction using the dividend yield and the short rate highlights the differences between the results by the standard approach and those by the bias-reduced estimator. The significant predictive variables under the ordinary least squares become insignificant after adjusting for the finite-sample bias. These discrepancies suggest that bias reduction in predictive regressions is important in practical applications.
- Research Article
78
- 10.1016/j.ssresearch.2012.05.014
- Jun 13, 2012
- Social Science Research
- Richard York
Residualization is not the answer: Rethinking how to address multicollinearity
- Research Article
10
- 10.2139/ssrn.1975789
- Dec 22, 2011
- SSRN Electronic Journal
- R Glenn Hubbard + 2 more
This paper examines evidence of lending discrimination in prime and subprime mortgage markets in New Jersey. Existing single-equation studies of race-based discrimination in mortgage lending assume race is uncorrelated with the disturbance term in the loan denial regression. At the individual loan-level, we show that race is correlated with both observable and unobservable risk variables, leading to biased coefficient estimates. To mitigate this problem, we specify a system of equations and use a full information maximum likelihood (FIML) method that does not need to identify instrumental variables for system identification. We find that minorities are less likely to be rejected than whites in the subprime market. The individual loan-level FIML results are robust to using two-stage least squares when we examine discrimination at the neighborhood-level. We also find that the reduction in rejection rates to minority neighborhoods from 1996 to 2008 cannot be fully justified by risk, suggesting a relaxation of lending standards to minority neighborhoods. Using the methodology of Mian and Sufi [2009], we also find evidence for strong credit supply effects.
- Research Article
126
- 10.1007/s00181-011-0518-4
- Sep 8, 2011
- Empirical Economics
- Wolfgang Hess + 1 more
The recent literature on the duration of trade has predominantly analyzed the determinants of trade flow durations using Cox proportional hazards models. The purpose of this article is to show why it is inappropriate to analyze the duration of trade with continuous-time models such as the Cox model, and to propose alternative discrete-time models which are more suitable for estimation. In brief, the Cox model has three major drawbacks when applied to large trade data sets. First, it faces problems in the presence of many tied duration times, leading to biased coefficient estimates and standard errors. Second, it is difficult to properly control for unobserved heterogeneity, which can lead to parameter bias and bias in the estimated survivor function. Third, the Cox model imposes the restrictive and empirically questionable assumption of proportional hazards. In contrast, with discrete-time models there is no problem handling ties; unobserved heterogeneity can be controlled for without difficulty; and the restrictive proportional hazards assumption can easily be bypassed. By replicating an influential study by Besedes and Prusa (J Int Econ 70:339–358, 2006b), but employing discrete-time models as well as the original Cox model, we find empirical support for each of these arguments against the Cox model. Moreover, when comparing estimation results obtained from a Cox model and our preferred discrete-time specification, we find significant differences in both the predicted survivor functions and the estimated effects of explanatory variables on the hazard. In other words, the choice between models affects the economic conclusions that can be drawn.
- Research Article
46
- 10.1016/j.jhe.2011.06.002
- Jul 14, 2011
- Journal of Housing Economics
- Steven Carter
Housing tenure choice and the dual income household
- Research Article
70
- 10.1093/gerona/glq188
- Oct 28, 2010
- The journals of gerontology. Series A, Biological sciences and medical sciences
- T E Murphy + 5 more
Longitudinal studies in gerontology are characterized by termination of measurement from death. Death is related to many important gerontological outcomes, such as functional disability, and may, over time, change the composition of an older study population. For these reasons, treating death as noninformative censoring of a longitudinal outcome may result in biased estimates of regression coefficients related to that outcome. In a longitudinal study of community-living older persons, we analytically and graphically illustrate the dependence between death and functional disability. Relative to survivors, decedents display a rapid decline of functional ability in the months preceding death. Death's strong relationship with functional disability demonstrates that death is not independent of this outcome and, hence, leads to informative censoring. We also demonstrate the "healthy survivor effect" that results from death's selection effect, with respect to functional disability, on the longitudinal makeup of an older study population. We briefly survey commonly used approaches for longitudinal modeling of gerontological outcomes, with special emphasis on their treatment of death. Most common methods treat death as noninformative censoring. However, joint modeling methods are described that take into account any dependency between death and a longitudinal outcome. In longitudinal studies of older persons, death is often related to gerontological outcomes and, therefore, cannot be safely assumed to represent noninformative censoring. Such analyzes must account for the dependence between outcomes and death as well as the changing nature of the cohort.
- Research Article
123
- 10.1016/j.ijresmar.2010.06.001
- Sep 3, 2010
- International Journal of Research in Marketing
- Baohong Sun + 1 more
Stated intentions and purchase behavior: A unified model
- Research Article
4
- 10.2139/ssrn.1545516
- Feb 2, 2010
- SSRN Electronic Journal
- Alexander M Gelber
This paper examines the response of husbands' and wives' earnings to a tax reform in which husbands' and wives' tax rates changed independently, allowing me to examine the effect of both spouses' incentives on each spouse's behavior. I compare the results to those of more simplified econometric models that are used in the typical setting in which such independent variation is not available. Using administrative panel data on approximately 11% of the married Swedish population, I analyze the impact of the large Swedish tax reform of 1990-1. I find that in response to a compensated fall in one spouse's tax rate, that spouse's earned income rises, and the other spouse's earned income also rises. I test and reject a set of models in which the family maximizes a single utility function. A standard econometric specification, in which one spouse reacts to the other spouse's income as if it were unearned income, yields biased coefficient estimates. Uncompensated elasticities of earned income with respect to the fraction of income kept after taxes are over-estimated by a factor of more than three, and income effects are of the wrong sign. A second common specification, in which overall family income is related to the family's tax rate and income, also yields substantially over-estimated own compensated and uncompensated elasticities. Standard econometric specifications may substantially mis-estimate earnings responses to taxation.
- Research Article
212
- 10.1186/1471-2288-10-7
- Jan 19, 2010
- BMC Medical Research Methodology
- Andrea Marshall + 3 more
BackgroundThere is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.MethodsDatasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.ResultsPerforming a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.ConclusionThe results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR.
- Research Article
3
- 10.3182/20100906-3-it-2019.00062
- Jan 1, 2010
- IFAC Proceedings Volumes
- Tristan Perez + 3 more
Parameter Estimation of Thrust Models of Uninhabited Airborne Systems
- Research Article
15
- 10.2139/ssrn.1684615
- Jan 1, 2010
- SSRN Electronic Journal
- Wolfgang Hess + 1 more
The recent literature on the duration of trade has predominantly analyzed the determinants of trade flow durations using Cox proportional hazards models. The purpose of this paper is to show why it is inappropriate to analyze the duration of trade with continuous-time models such as the Cox model, and to propose alternative discrete-time models which are more suitable for estimation. Briefly, the Cox model has three major drawbacks when applied to large trade data sets. First, it faces problems in the presence of many tied duration times, leading to biased coefficient estimates and standard errors. Second, it is difficult to properly control for unobserved heterogeneity, which can result in spurious duration dependence and parameter bias. Third, the Cox model imposes the restrictive and empirically questionable assumption of proportional hazards. By contrast, with discrete-time models there is no problem handling ties; unobserved heterogeneity can be controlled for without difficulty; and the restrictive proportional hazards assumption can easily be bypassed. By replicating an influential study by Besedes and Prusa from 2006, but employing discrete-time models as well as the original Cox model, we find empirical support for each of these arguments against the Cox model. Moreover, when comparing estimation results obtained from a Cox model and our preferred discrete-time specification, we find significant differences in both the predicted hazard rates and the estimated effects of explanatory variables on the hazard. In other words, the choice between models affects the conclusions that can be drawn.
- Research Article
89
- 10.1007/s11187-009-9240-4
- Nov 7, 2009
- Small Business Economics
- Hanas A Cader + 1 more
Analyses of small business and the factors affecting their survival are fairly common in the research literature. The level of research interest may stem from the fact that in the US, only about half of all new small businesses survive after 4 years (Headd 2003). However, research attempting to understand the phenomenon that employs data using only information from and about surviving firms may lead to erroneous conclusions regarding the factors that influence firm survival and failure. In this paper, we provide evidence that omitted information about the firms that disappear from the research data over time leads to biased coefficient estimates. Comparing the Heckman two-step estimation approach of switching regression models to a semi-parametric Cox hazard model, the Accelerated Failure Time (AFT) model, we conclude that the Cox ATF approach is the most appropriate model for firm survival analysis.
- Research Article
1
- 10.2139/ssrn.1434425
- Jul 18, 2009
- SSRN Electronic Journal
- Turan G Bali + 1 more
There exists a small sample bias in predictive regressions, when a rate of return is regressed on a lagged stochastic regressor, and the regression disturbance is correlated with the regressors’ innovations. Although this bias can be a serious concern in time-series predictive regressions, it is not significant in panel data setting. By using simulations and stock level data, we document that as the number of cross sections used in the panel data increases the bias in coefficient estimates becomes negligible.
- Research Article
23
- 10.1002/env.972
- Apr 3, 2009
- Environmetrics
- Michael G Schimek
Abstract For more than a decade generalized additive models (GAMs) have been successfully applied in various environmental studies, for instance to evaluate the impact of air pollution on health. The air pollution measure is usually connected with the health indicator in a parametric fashion whilst the effects of other covariates are modelled through nonparametric smooth functions. This is the motivation for the widely used semiparametric GAMs. The backfitting‐GAM methodology, and its popular implementation in S‐Plus, constitutes the standard approach. Here we consider its limitations and offer an alternative penalized likelihood concept. The primary limitations are the lack of tools for multiple data‐driven smoothing parameter choice, slow convergence of the iterative backfitting algorithm when concurvity is present, and unstable biased estimates of the regression coefficients and their standard errors in the semiparametric case, even more pronounced under concurvity. The penalized likelihood methodology when combined with cubic spline smoothers allows for a computationally efficient and complete parametric representation of a GAM, either nonparametric or semiparametric. It helps circumvent most of the mentioned GAM flaws in environmental research and epidemiology (e.g. in studies of human exposure to particulate matter). Finally, we discuss various evidence from simulation experiments in the literature, concerning the proper use of the GAM methodology. Copyright © 2009 John Wiley & Sons, Ltd.