Recurrent graft fibrosis after liver transplantation can threaten both graft and patient survival. Therefore, early detection of fibrosis is essential to avoid disease progression and the need for retransplantation. Non-invasive blood-based biomarkers of fibrosis are limited by moderate accuracy and high cost. We aimed to evaluate the accuracy of machine learning algorithms in detecting graft fibrosis using longitudinal clinical and laboratory data. In this retrospective, longitudinal study, we trained machine learning algorithms, including our novel weighted long short-term memory (LSTM) model, to predict the risk of significant fibrosis using follow-up data from 1893 adults who had a liver transplantation between Feb 1, 1987, and Dec 30, 2019, with at least one liver biopsy post transplantation. Liver biopsy samples with indefinitive fibrosis stage and those from patients with multiple transplantations were excluded. Longitudinal clinical variables were collected from transplantation to the date of last available liver biopsy. Deep learning models were trained on 70% of the patients as the training set and 30% of the patients as the test set. The algorithms were also separately tested on longitudinal data from patients in a subgroup of patients (n=149) who had transient elastography within 1 year before or after the date of liver biopsy. Weighted LSTM model performance for diagnosing significant fibrosis was compared against LSTM, other deep learning models (recurrent neural network and temporal convolutional network), and machine learning models (Random Forest, Support vector machines, Logistic regression, Lasso regression, and Ridge regression) and aspartate aminotransferase-to-platelet ratio index (APRI), fibrosis-4 index (FIB-4), and transient elastography. 1893 people who had a liver transplantation (1261 [67%] men and 632 [33%] women) with at least one liver biopsy between Jan 1, 1992, and June 30, 2020, were included in the study (591 [31%] cases and 1302 [69%] controls). The median age at liver transplantation was 53·7 years (IQR 47·3-59·0) for cases and 55·3 years (48·0 to 61·2) for controls. The median time interval between transplant and liver biopsy was 21 months (5 to 71). The weighted LSTM model (area under the curve 0·798 [95% CI 0·790 to 0·810]) consistently outperformed other methods, including unweighted LSTM (0·761 [0·750 to 0·769]; p=0·031) Recurrent Neural Network (0·736 [0·721 to 0·744]), Temporal Convolutional Networks (0·700 [0·662 to 0·747], and Random Forest 0·679 [0·652 to 0·707]), FIB-4 (0·650 [0·636 to 0·663]) and APRI (0·682 [0·671 to 0·694]) when diagnosing F2 or worse stage fibrosis. In a subgroup of patients with transient elastography results, weighted LSTM was not significantly better at detecting fibrosis (≥F2; 0·705 [0·687 to 0·724]) than transient elastography (0·685 [0·662 to 0·704]). The top ten variables predictive for significant fibrosis were recipient age, primary indication for transplantation, donor age, and longitudinal data for creatinine, alanine aminotransferase, aspartate aminotransferase, total bilirubin, platelets, white blood cell count, and weight. Deep learning algorithms, particularly weighted LSTM, outperform other routinely used non-invasive modalities and could help with the earlier diagnosis of graft fibrosis using longitudinal clinical and laboratory variables. The list of most important predictive variables for the development of fibrosis will enable clinicians to modify their management accordingly to prevent onset of graft cirrhosis. Canadian Institute of Health Research, American Society of Transplantation, Toronto General and Western Hospital Foundation, and Paladin Labs.
Read full abstract