BackgroundOur aim was to explore whether a two-step hybrid machine learning model has the potential to discover the onset of depression in home-based older adults.MethodsDepression data (collected in the year 2011, 2013, 2015 and 2018) of home-based older Chinese (n = 2,548) recruited in the China Health and Retirement Longitudinal Study were included in the current analysis. The long short-term memory network (LSTM) was applied to identify the risk factors of participants in 2015 utilizing the first 2 waves of data. Based on the identified predictors, three ML classification algorithms (i.e., gradient boosting decision tree, support vector machine and random forest) were evaluated with a 10-fold cross-validation procedure and a metric of the area under the receiver operating characteristic curve (AUROC) to estimate the depressive outcome.ResultsTime-varying predictors of the depression were successfully identified by LSTM (mean squared error =0.8). The mean AUCs of the three predictive models had a range from 0.703 to 0.749. Among the prediction variables, self-reported health status, cognition, sleep time, self-reported memory and ADL (activities of daily living) disorder were the top five important variables.ConclusionsA two-step hybrid model based on “LSTM+ML” framework can be robust in predicting depression over a 5-year period with easily accessible sociodemographic and health information.