Published in last 50 years
Articles published on Biased Coefficient Estimates
- Research Article
3
- 10.1111/apel.12284
- Apr 1, 2020
- Asian-Pacific Economic Literature
- Tanthaka Vivatsurakit + 1 more
Thailand experienced rapid economic development and made significant investments in education over the past four decades; however, more than half of Thai workers remain informally employed. Despite the prevalence and persistence of informal work in Thailand, little is known about the returns to investments in formal education among informal workers. Using individual‐level data from the 2011, 2013, and 2015 Thailand Household Socio‐economic Surveys, this study estimates the wage returns to years of education for informal workers using an instrumental variable (IV) approach to correct for potentially biased coefficient estimates on years of education due to unobserved ability. Contrary to expectations, informally employed Thai workers find substantial returns to investments in formal education. The results under the IV approach indicate that the return to an additional year of education for the informally employed is 11–12 per cent, compared to almost 15 per cent for formally employed private firm workers.
- Research Article
7
- 10.1016/j.ijforecast.2019.11.003
- Mar 24, 2020
- International Journal of Forecasting
- Y Dendramis + 3 more
Predicting default risk under asymmetric binary link functions
- Research Article
5
- 10.1177/1077558719858839
- Jul 9, 2019
- Medical Care Research and Review
- Jennifer M Mellor + 2 more
Previous studies show that survey-based reports of Medicaid participation are measured with error, but no prior study has examined measurement error in an important segment of the Medicaid population-low-income adults enrolled in Medicare. Using the Medicare Current Beneficiary Survey, we examine whether respondent self-reports of Medicaid enrollment match administrative records and present several key findings. First, among low-income Medicare beneficiaries, the false negative rate is 11.5% when the self-report is interpreted as full Medicaid and 3.7% when the self-report is interpreted as full or partial Medicaid. Second, the likelihood of a false negative report is systematically associated with respondent traits. Third, systematic measurement error results in biased coefficient estimates in models of Medicaid participation defined from self-reports, and the bias is more significant when the researcher interprets self-reports as full Medicaid coverage only. Researchers should use caution when interpreting survey reports as pertaining to full Medicaid coverage only.
- Research Article
4
- 10.1177/0361198119841856
- Apr 16, 2019
- Transportation Research Record: Journal of the Transportation Research Board
- Anusha Musunuru + 1 more
Road safety modelers frequently use average annual daily traffic (AADT) as a measure of exposure in regression models of expected crash frequency for road segments and intersections. Recorded AADT values at most locations are estimated by state and local transportation agencies with significant uncertainty, often by extrapolating short-term traffic counts over time and space. This uncertainty in the traffic volume estimates, often termed in a modeling context as measurement error in right-hand-side variables, can have serious effects on model estimation, including: 1) biased regression coefficient estimates; and 2) increases in dispersion. The structure and magnitude of measurement error in AADT estimates are not clearly understood by researchers or practitioners, leading to difficulties in explicitly accounting for this error in statistical road safety models, and ultimately in finding solutions for its correction. This study explores the impacts of measurement error in traffic volume estimates on statistical road safety models by employing measurement error correction approaches, including regression calibration and simulation extrapolation. The concept is demonstrated using crash, traffic, and roadway data from rural, two-lane horizontal curves in the State of Washington. The overall results show that the regression coefficient estimates with a positive coefficient were larger and those with a negative coefficient were smaller (i.e., more negative) when the measurement error correction methods were applied to the regression models of expected crash frequency. Future directions in applications of measurement error correction approaches to road safety research are provided.
- Research Article
5
- 10.1017/pan.2019.6
- Mar 25, 2019
- Political Analysis
- Benjamin E Bagozzi + 3 more
We develop a new Bayesian split population survival model for the analysis of survival data with misclassified event failures. Within political science survival data, right-censored survival cases are often erroneously misclassified as failure cases due to measurement error. Treating these cases as failure events within survival analyses will underestimate the duration of some events. This will bias coefficient estimates, especially in situations where such misclassification is associated with covariates of interest. Our split population survival estimator addresses this challenge by using a system of two equations to explicitly model the misclassification of failure events alongside a parametric survival process of interest. After deriving this model, we use Bayesian estimation via slice sampling to evaluate its performance with simulated data, and in several political science applications. We find that our proposed “misclassified failure” survival model allows researchers to accurately account for misclassified failure events within the contexts of civil war duration and democratic survival.
- Research Article
- 10.2139/ssrn.3337856
- Feb 19, 2019
- SSRN Electronic Journal
- Yiannis Dendramis + 3 more
In this paper we propose the use of an asymmetric binary link function to extend the proportional hazard model for predicting loan default. The rationale behind this approach is that the symmetry assumption, that has been widely used in the literature, could be considered as quite restrictive, especially during periods of financial distress. In our approach we allow for a flexible level of asymmetry in the probability of default by the use of the skewed logit distribution. This enable us to estimate the actual level of asymmetry that is associated with the data at hand. We implement our approach to both simulated data and a rich micro dataset of consumer loan accounts. Our results provide clear cut evidence that ignoring the actual level of asymmetry leads to seriously biased estimates of the slope coefficients, inaccurate marginal effects of the covariates of the model, and overestimation of the probability of default. Regarding the predictive power of the covariates of the model, we have found that loan specific covariates, contain considerably more information about the loan default than macroeconomic covariates, which are often used in practice to carry out macroprudential stress testing.
- Discussion
31
- 10.1016/j.indmarman.2019.02.008
- Feb 1, 2019
- Industrial Marketing Management
- Richard T Gretz + 1 more
Rejoinder to “Endogeneity bias in marketing research: Problem, causes and remedies”
- Research Article
11
- 10.1177/0081175018793460
- Aug 30, 2018
- Sociological Methodology
- Benjamin F Jarvis
This comment reconsiders advice offered by Bruch and Mare regarding sampling choice sets in conditional logistic regression models of residential mobility. Contradicting Bruch and Mare's advice, past econometric research shows that no statistical correction is needed when using simple random sampling of unchosen alternatives to pare down respondents' choice sets. Using data on stated residential preferences contained in the Los Angeles portion of the Multi-City Study of Urban Inequality, it is shown that following Bruch and Mare's advice-to implement a statistical correction for simple random choice set sampling-leads to biased coefficient estimates. This bias is all but eliminated if the sampling correction is omitted.
- Research Article
2
- 10.1080/03610918.2015.1005230
- Nov 1, 2017
- Communications in Statistics - Simulation and Computation
- Xin Xin + 2 more
ABSTRACTRecent work has shown that the presence of ties between an outcome event and the time that a binary covariate changes or jumps can lead to biased estimates of regression coefficients in the Cox proportional hazards model. One proposed solution is the Equally Weighted method. The coefficient estimate of the Equally Weighted method is defined to be the average of the coefficient estimates of the Jump Before Event method and the Jump After Event method, where these two methods assume that the jump always occurs before or after the event time, respectively. In previous work, the bootstrap method was used to estimate the standard error of the Equally Weighted coefficient estimate. However, the bootstrap approach was computationally intensive and resulted in overestimation. In this article, two new methods for the estimation of the Equally Weighted standard error are proposed. Three alternative methods for estimating both the regression coefficient and the corresponding standard error are also proposed. All the proposed methods are easy to implement. The five methods are investigated using a simulation study and are illustrated using two real datasets.
- Research Article
172
- 10.1017/psrm.2017.4
- May 3, 2017
- Political Science Research and Methods
- Arjun S Wilkins
Lagged dependent variables (LDVs) have been used in regression analysis to provide robust estimates of the effects of independent variables, but some research argues that using LDVs in regressions produces negatively biased coefficient estimates, even if the LDV is part of the data-generating process. I demonstrate that these concerns are easily resolved by specifying a regression model that accounts for autocorrelation in the error term. This actually implies that more LDV and lagged independent variables should be included in the specification, not fewer. Including the additional lags yields more accurate parameter estimates, which I demonstrate using the same data-generating process scholars had previously used to argue against including LDVs. I use Monte Carlo simulations to show that this specification returns much more accurate coefficient estimates for independent variables (across a wide range of parameter values) than alternatives considered in earlier research. The simulation results also indicate that improper exclusion of LDVs can lead to severe bias in coefficient estimates. While no panacea, scholars should continue to confidently include LDVs as part of a robust estimation strategy.
- Research Article
34
- 10.1111/sjos.12275
- Apr 21, 2017
- Scandinavian Journal of Statistics
- Garritt L Page + 3 more
Abstract In studies that produce data with spatial structure, it is common that covariates of interest vary spatially in addition to the error. Because of this, the error and covariate are often correlated. When this occurs, it is difficult to distinguish the covariate effect from residual spatial variation. In ani.i.d.normal error setting, it is well known that this type of correlation produces biased coefficient estimates, but predictions remain unbiased. In a spatial setting, recent studies have shown that coefficient estimates remain biased, but spatial prediction has not been addressed. The purpose of this paper is to provide a more detailed study of coefficient estimation from spatial models when covariate and error are correlated and then begin a formal study regarding spatial prediction. This is carried out by investigating properties of the generalized least squares estimator and the best linear unbiased predictor when a spatial random effect and a covariate are jointly modelled. Under this setup, we demonstrate that the mean squared prediction error is possibly reduced when covariate and error are correlated.
- Research Article
7
- 10.3102/1076998617700598
- Apr 12, 2017
- Journal of Educational and Behavioral Statistics
- Steven Andrew Culpepper + 1 more
A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model predictors. The model is applicable to large-scale assessments such as the National Assessment of Educational Progress (NAEP), which includes hundreds of student, teacher, and school predictors of latent achievement. Monte Carlo evidence suggests that employing the GAL prior provides more precise estimation of coefficients that equal zero in comparison to a multivariate normal (MVN) prior, which translates to more accurate model selection. Furthermore, the GAL yielded less biased estimates of regression coefficients in smaller samples. The developed model is applied to mathematics achievement data from the 2011 NAEP for 175,200 eighth graders. The GAL and MVN NAEP estimates were similar, but the GAL was more parsimonious by selecting 12 fewer (i.e., 83 of the 148) variable groups. There were noticeable differences between estimates computed with a GAL prior and plausible value regressions with the AM software (beta version 0.06.00). Implications of the results are discussed for test developers and applied researchers.
- Research Article
3
- 10.2139/ssrn.2947340
- Apr 8, 2017
- SSRN Electronic Journal
- Mark J Kamstra
The existence of reversals and momentum in equity returns has challenged proponents of efficient markets for over 30 years. Although explanations for momentum profits based on cross-sectional mean return dispersion have been proposed, evidence of time-series autocorrelation from Fama-MacBeth cross-sectional regressions persists without any good risk/return explanation. In this paper I show that common implementations of the Fama-MacBeth procedure will yield upward biased estimates of time-series autocorrelation coefficients. Even in absence of autocorrelation, the bias is strictly positive, leading to apparent momentum when there is, in fact, none. This biased implementation of the Fama-MacBeth procedure has found its way into a great many other studies and may, similarly, lead to apparent effects when there are none. I outline conditions under which this bias occurs and prove the existence of bias under these conditions. I also provide a Monte Carlo simulation showing the magnitude of the bias, I demonstrate the impact of this bias with reference to published results in the literature, and I introduce a new test for misspecification of an asset pricing model. Additionally, I suggest and explore simple fixes for this bias. Some variation of a firm fixed-effects model is appropriate to correct for this bias in applications using the Fama-MacBeth method.
- Research Article
- 10.2139/ssrn.2940185
- Mar 24, 2017
- SSRN Electronic Journal
- Shima Amini + 1 more
We show that expected returns on US stocks and all major stock world market indices are non-linearly dependent on previous returns. The expected sign of returns tends to reverse after large price movements and trends tend to continue after small movements. This property can be captured by a simple polynomial model. Incorrectly fitting a simple linear model to the data leads to substantial bias in coefficient estimates and the polynomial model can be used to eliminate trends in the data. In addition, well known technical trading rules may be substantially driven by the non-linear behavior observed.
- Research Article
67
- 10.1037/met0000072
- Sep 1, 2016
- Psychological Methods
- Mijke Rhemtulla
Previous research has suggested that the use of item parcels in structural equation modeling can lead to biased structural coefficient estimates and low power to detect model misspecification. The present article describes the population performance of items, parcels, and scales under a range of model misspecifications, examining structural path coefficient accuracy, power, and population fit indices. Results revealed that, under measurement model misspecification, any parceling scheme typically results in more accurate structural parameters, but less power to detect the misspecification. When the structural model is misspecified, parcels do not affect parameter accuracy, but they do substantially elevate power to detect the misspecification. Under particular, known measurement model misspecifications, a parceling scheme can be chosen to produce the most accurate estimates. The root mean square error of approximation and the standardized root mean square residual are more sensitive to measurement model misspecification in parceled models than the likelihood ratio test statistic. (PsycINFO Database Record
- Research Article
5
- 10.1111/jtsa.12205
- Aug 8, 2016
- Journal of Time Series Analysis
- William Dunsmuir + 1 more
This article develops asymptotic theory for estimation of parameters in regression models for binomial response time series where serial dependence is present through a latent process. Use of generalized linear model estimating equations leads to asymptotically biased estimates of regression coefficients for binomial responses. An alternative is to use marginal likelihood, in which the variance of the latent process but not the serial dependence is accounted for. In practice, this is equivalent to using generalized linear mixed model estimation procedures treating the observations as independent with a random effect on the intercept term in the regression model. We prove that this method leads to consistent and asymptotically normal estimates even if there is an autocorrelated latent process. Simulations suggest that the use of marginal likelihood can lead to generalized linear model estimates result. This problem reduces rapidly with increasing number of binomial trials at each time point, but for binary data, the chance of it can remain over 45% even in very long time series. We provide a combination of theoretical and heuristic explanations for this phenomenon in terms of the properties of the regression component of the model, and these can be used to guide application of the method in practice.
- Research Article
5
- 10.1177/0081175016654737
- Jul 9, 2016
- Sociological Methodology
- Xi Chen
As an emerging research area, application of satellite-based nighttime lights data in the social sciences has increased rapidly in recent years. This study, building on the recent surge in the use of satellite-based lights data, explores whether information provided by such data can be used to address attenuation bias in the estimated coefficient when the regressor variable, Gross Domestic Product (GDP), is measured with large error. Using an example of a study on infant mortality rates (IMRs) in the People’s Republic of China (PRC), this paper compares four models with different indicators of GDP as the regressor of IMR: (1) observed GDP alone, (2) lights variable as a substitute, (3) a synthetic measure based on weighted observed GDP and lights, and (4) GDP with lights as an instrumental variable. The results show that the inclusion of nighttime lights can reduce the bias in coefficient estimates compared with the model using observed GDP. Among the three approaches discussed, the instrumental-variable approach proves to be the best approach in correcting the bias caused by GDP measurement error and estimates the effect of GDP much higher than do the models using observed GDP. The study concludes that beyond the topic of this study, nighttime lights data have great potential to be used in other sociological research areas facing estimation bias problems due to measurement errors in economic indicators. The potential is especially great for those focusing on developing regions or small areas lacking high-quality measures of economic and demographic variables.
- Research Article
67
- 10.1016/j.ehb.2015.07.001
- Jul 23, 2015
- Economics & Human Biology
- John Cawley + 3 more
Reporting error in weight and its implications for bias in economic models
- Research Article
186
- 10.1177/1094428115595869
- Jul 19, 2015
- Organizational Research Methods
- Gordon W Cheung + 1 more
Currently, the most popular analytical method for testing moderated mediation is the regression approach, which is based on observed variables and assumes no measurement error. It is generally acknowledged that measurement errors result in biased estimates of regression coefficients. What has drawn relatively less attention is that the confidence intervals produced by regression are also biased when the variables are measured with errors. Therefore, we extend the latent moderated structural equations (LMS) method—which corrects for measurement errors when estimating latent interaction effects—to the study of the moderated mediation of latent variables. Simulations were conducted to compare the regression approach and the LMS approach. The results show that the LMS method produces accurate estimated effects and confidence intervals. By contrast, regression not only substantially underestimates the effects but also produces inaccurate confidence intervals. It is likely that the statistically significant moderated mediation effects that have been reported in previous studies using regression include biased estimated effects and confidence intervals that do not include the true values.
- Research Article
36
- 10.3141/2514-17
- Jan 1, 2015
- Transportation Research Record: Journal of the Transportation Research Board
- Kai Wang + 4 more
In the transportation safety field, in an effort to improve safety, statistical models are developed to identify factors that contribute to crashes as well as those that affect injury severity. This study contributes to the literature on severity analysis. Injury severity and vehicle damage are two important indicators of severity in crashes and are typically modeled independently. However, there are common observed and unobserved factors affecting the two crash indicators that lead to potential interrelationships. Failure to account for the interrelationships between the indicators may lead to biased coefficient estimates in crash severity prediction models. The focus of this study was to explore interrelationships between injury severity and vehicle damage and to also identify the nature of these correlations across different types of crashes. A copula-based methodology that could simultaneously model injury severity and vehicle damage while also accounting for the interrelationships between the two indicators was employed. Furthermore, parameterization of the copula structure was used to represent the interrelationships between the crash indicators as a function of crash characteristics. In this study, six specifications of the copula model—Gaussian, Farlie–Gumbel–Morgenstern, Frank, Clayton, Joe, and Gumbel—were developed. On the basis of goodness-of-fit statistics, the Gaussian copula model was found to outperform the other copula-based model specifications. Results indicated that interrelationships between injury severity and vehicle damage varied with different crash characteristics including manner of collision and collision type.