Abstract

When a predictive model is in production, it must be monitored over time to ensure that its performance does not suffer from drift or abrupt changes to data. Typically this is done by evaluating the algorithm's predictions to outcome data and ensuring that the algorithm maintains an acceptable level of accuracy over time. However, it is far preferable to learn about major changes in the input data that could affect the models performance in real-time, long before learning that the performance of the model itself has dropped by monitoring outcome data. Thus, there is large need for robust, real-time monitoring of high dimensional input data over time. Here we consider the problem of change point detection on high-dimensional longitudinal data with mixed variable types and missing values. We do this by fitting an array of Mixture Gaussian Graphical Models to groupings of homogeneous data in time, called regimes, which we model as the observed states of a Markov process with unknown transition probabilities. The primary goal of this model is to identify when there is a regime change, as this indicates a significant change in the input data distribution. To handle the messy nature of real-world data which has mixed continuous/discrete variable types, missing data, etc., we take a Bayesian latent variable approach. This affords us flexibility to handle missing values in a principled manner, while simultaneously providing a way to encode discrete and censored values into a continuous framework. We take this approach a step further by encoding the missingness structure, which allows our model to then detect major changes in the patterns of missingness, in addition to the structure of the data distributions themselves. We assess our approach on simulated data, and apply it to an in-production model for the need for a palliative care consult at

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call