Abstract Background and Aims Acute kidney injury (AKI) has a substantial impact on global disease burden of Chronic Kidney Disease. To assist physicians with the timely diagnosis of AKI, several prognostic models have been developed to improve early recognition across various patient populations with varying degrees of predictive performance. In the prediction of AKI, machine learning (ML) techniques have been demonstrated to improve on the predictive ability of existing models that rely on more conventional statistical methods. ML is a broad term which refers to various types of models: Parametric models, such as linear or logistic regression use a pre-specified model form which is believed to fit the data, and its parameters are estimated. Non-parametric models, such as decision trees, random forests, and neural networks may have varying complexity (e.g. the depth of a classification tree model) based on the data. Deep learning neural network models exploit temporal or spatial arrangements in the data to deal with complex predictors. Given the rapid growth and development of ML methods and models for AKI prediction over the past years, in this systematic review, we aim to appraise the current state-of-the-art regarding ML models for the prediction of AKI. To this end, we focus on model performance, model development methods, model evaluation, and methodological limitations. Method We searched the PubMed and ArXiv digital libraries, and selected studies that develop or validate an AKI-related multivariable ML prediction model. We extracted data using a data extraction form based on the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) and CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklists. Results Overall, 2,875 titles were screened and thirty-four studies were included. Of those, thirteen studies focussed on intensive care, for which the US derived MIMIC dataset was commonly used; thirty-one studies both developed and validated a model; twenty-one studies used single-centre data. Non-parametric ML methods were used more often than regression and deep learning. Random forests was the most popular method, and often performed best in model comparisons. Deep learning was typically used (and also effective) when complex features were included (e.g., with text or time series). Internal validation was often applied, and the performance of ML models was usually compared against logistic regression. However, the simple training/test split was often used, which does not account for the variability of the training and test samples. Calibration, external validation, and interpretability of results were rarely considered. Comparisons of model performance against medical scores or clinicians were also rare. Reproducibility was limited, as data and code were usually unavailable. Conclusion There is an increasing number of ML models for AKI, which are mostly developed in the intensive care environment largely due to the availability of the MIMIC dataset. Most studies are single-centre, and lack a prospective design. More complex models based on deep learning are emerging, with the potential to improve predictions for complex data, such as time-series, but with the disadvantage of being less interpretable. Future studies should pay attention to using calibration measures, external validation, and on improving model interpretability, in order to improve uptake in clinical practice. Finally, sharing data and code could improve reproducibility of study findings.
Read full abstract