Data-driven disease progression models are an emerging set of computational tools that reconstruct disease timelines for long-term chronic diseases, providing unique insights into disease processes and their underlying mechanisms. Such methods combine a priori human knowledge and assumptions with large-scale data processing and parameter estimation to infer long-term disease trajectories from short-term data. In contrast to 'black box' machine learning tools, data-driven disease progression models typically require fewer data and are inherently interpretable, thereby aiding disease understanding in addition to enabling classification, prediction and stratification. In this Review, we place the current landscape of data-driven disease progression models in a general framework and discuss their enhanced utility for constructing a disease timeline compared with wider machine learning tools that construct static disease profiles. We review the insights they have enabled across multiple neurodegenerative diseases, notably Alzheimer disease, for applications such as determining temporal trajectories of disease biomarkers, testing hypotheses about disease mechanisms and uncovering disease subtypes. We outline key areas for technological development and translation to a broader range of neuroscience and non-neuroscience applications. Finally, we discuss potential pathways and barriers to integrating disease progression models into clinical practice and trial settings.