Abstract

Propensity scores are often used to adjust for between-group variation in covariates, when individuals cannot be randomized to groups. There is great flexibility in how these scores can be appropriately used. This flexibility might encourage p-value hacking – where several alternative uses of propensity scores are explored and the one yielding the lowest p-value is selectively reported. Such unreported multiple testing must inevitably inflate type I error rates – our focus is on exploring how strong this inflation effect might be. Across three different scenarios, we compared the performance of four different methods. Each taken individually gave type I error rates near the nominal (5%) value, but taking the minimum value of four tests led to actual error rates between 150% and 200% of the nominal value. Hence, we strongly recommend pre-selection of the details of the statistical treatment of propensity scores to avoid risk of very serious over-inflation of type I error rates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call