Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data

Joan C Timoneda

doi:10.1016/j.ssresearch.2020.102486

Abstract

Estimating fixed effects models can be challenging with rare events data. Researchers often face difficult trade-offs when selecting between the Linear Probability Model (LPM), logistic regression with group intercepts and the conditional logit. In this paper, I survey these tradeoffs and argue that, in fact, the LPM with fixed effects produces more accurate estimates and predicted probabilities than maximum likelihood specifications when the dependent variable has less than 25 percent of ones. I use Monte Carlo simulations to show when the LPM with fixed effects should be preferred. I perform these simulations on common time-series cross-sectional (TSCS) data structures found in the literature as well as big data. This paper provides clarity around fixed effects models in TSCS data and a novel technique to identify which one to use as a function of the frequency of events in y.

Full Text