Background and AimsNon-alcoholic fatty liver disease (NAFLD) is one of the most common liver diseases. There are no universally accepted models that accurately predict time to onset of NAFLD. Machine learning (ML) models may allow prediction of such time-to-event (i.e. survival) outcomes. This study aims to develop and independently validate ML derived models to allow personalised prediction of time to onset of NAFLD in individuals who have no NAFLD at baseline. MethodsThe development dataset comprised 25599 individuals from a South Korean NAFLD registry. A random 70:30 split divided it into training and internal validation sets. ML survival models (Random Survival Forest (RSF), Extra Survival Trees (XST)) were fitted, with time to NAFLD diagnosis in months as the target variable and routine anthropometric and laboratory parameters as predictors. The independent validation dataset comprised 16173 individuals from a Chinese open dataset. Models were evaluated using the concordance index (c-index) and Brier score on both the internal and independent validation sets. ResultsThe datasets (development vs independent validation) had 1331107 vs 543874 person months of follow up, NAFLD incidence of 25.7% (6584 individuals) vs 14.4% (2322 individuals), and median time to NAFLD onset of 60 (IQR 38-75) vs 24 (IQR 13-37) months, respectively. The ML models achieved a good c-index of >0.7 in the validation cohort - RSF 0.751 (95% CI 0.742-0.759), XST 0.752 (95% CI 0.744-0.762). ConclusionML models can predict time-to-onset of NAFLD based on routine patient data. They can be used by clinicians to deliver personalised predictions to patients, which may facilitate patient counselling and clinical decision making on interval imaging timing.