The event-based model constructs a discrete picture of disease progression from cross-sectional data sets, with each event corresponding to a new biomarker becoming abnormal. However, it relies on the assumption that all subjects follow a single event sequence. This is a major simplification for sporadic disease data sets, which are highly heterogeneous, include distinct subgroups, and contain significant proportions of outliers. In this work we relax this assumption by considering two extensions to the event-based model: a generalised Mallows model, which allows subjects to deviate from the main event sequence, and a Dirichlet process mixture of generalised Mallows models, which models clusters of subjects that follow different event sequences, each of which has a corresponding variance. We develop a Gibbs sampling technique to infer the parameters of the two models from multi-modal biomarker data sets. We apply our technique to data from the Alzheimer's Disease Neuroimaging Initiative to determine the sequence in which brain regions become abnormal in sporadic Alzheimer's disease, as well as the heterogeneity of that sequence in the cohort. We find that the generalised Mallows model estimates a larger variation in the event sequence across subjects than the original event-based model. Fitting a Dirichlet process model detects three subgroups of the population with different event sequences. The Gibbs sampler additionally provides an estimate of the uncertainty in each of the model parameters, for example an individual's latent disease stage and cluster assignment. The distributions and mixtures of sequences that this new family of models introduces offer better characterisation of disease progression of heterogeneous populations, new insight into disease mechanisms, and have the potential for enhanced disease stratification and differential diagnosis.
Read full abstract