BackgroundIncreased costs in the health sector have put considerable strain on the public budgets allocated to pharmaceutical purchases. Faced with such pressures amplified by financial crises and pandemics, national purchasing authorities are presented with a puzzle: how to procure pharmaceuticals of the highest quality for the lowest price. The literature explored a range of impactful factors using data on producer and reference prices, but largely foregone the use of data on individual purchases by diverse public buyers.MethodsLeveraging the availability of open data in public procurement from official government portals, the article examines the relationship between unit prices and a host of predictors that account for policies that can be amended nationally or locally. The study uses traditional linear regression (OLS) and a machine learning model, random forest, to identify the best models for predicting pharmaceutical unit prices. To explore the association between a wide variety of predictors and unit prices, the study relies on more than 200,000 purchases in more than 800 standardized pharmaceutical product categories from 10 countries and territories.ResultsThe results show significant price variation of standardized products between and within countries. Although both models present substantial potential for predicting unit prices, the random forest model, which can incorporate non-linear relationships, leads to higher explained variance (R2 = 0.85) and lower prediction error (RMSE = 0.81).ConclusionsThe results demonstrate the potential of i) tapping into large quantities of purchase-level data in the health care sector and ii) using machine learning models for explaining and predicting pharmaceutical prices. The explanatory models identify data-driven policy interventions for decision-makers seeking to improve value for money.
Read full abstract