Analysis of Rare Events

Heinz Leitgöb

doi:10.4135/9781526421036863804

Abstract

Rare events represent a great analytical challenge. The maximum likelihood-based (ML) binary logit model as the workhorse model in the social sciences can generate heavily biased parameter estimates if events are rare. In detail, the finite sample bias in ML estimates may be substantially larger than that observed in cases with balanced data of the same sample size. Furthermore, the ML estimator is prone to overfitting rare event data even in low-dimensional models and not identified in cases of perfectly separated data. Starting with a brief introduction to the standard binary logit as a reference model, this entry discusses several design issues (e.g., selection on the dependent variable) and analytical approaches (e.g., first-order bias correction, exact conditional inference, penalized ML estimation, specification of cloglog models) to overcome these threats to valid inferences. Finally, the potential of Bayesian rare event modeling, which addresses some limitations of the frequentist probability perspective, is briefly introduced.

Full Text