We aimed to develop and validate machine learning algorithms to predict direct-acting antiviral (DAA) treatment failure among patients with HCV infection. We used HCV-TARGET registry data to identify HCV-infected adults receiving all-oral DAA treatment and having virologic outcome. Potential pretreatment predictors (n=179) included sociodemographic, clinical characteristics, and virologic data. We applied multivariable logistic regression as well as elastic net, random forest, gradient boosting machine (GBM), and feedforward neural network machine learning algorithms to predict DAA treatment failure. Training (n=4894) and validation (n=1631) patient samples had similar sociodemographic and clinical characteristics (mean age, 57 years; 60% male; 66% White; 36% with cirrhosis). Of 6525 HCV-infected adults, 95.3% achieved sustained virologic response, whereas 4.7% experienced DAA treatment failure. In the validation sample, machine learning approaches performed similarly in predicting DAA treatment failure (C statistic [95% CI]: GBM, 0.69 [0.64-0.74]; random forest, 0.68 [0.63-0.73]; feedforward neural network, 0.66 [0.60-0.71]; elastic net, 0.64 [0.59-0.70]), and all four outperformed multivariable logistic regression (0.51 [0.46-0.57]). Using the Youden index to identify the balanced risk score threshold, GBM had 66.2% sensitivity and 65.1% specificity, and 12 individuals were needed to evaluate to identify 1 DAA treatment failure. Over 55% of patients with treatment failure were classified by the GBM in the top three risk decile subgroups (positive predictive value: 6%-14%). The top 10 GBM-identified predictors included albumin, liver enzymes (aspartate aminotransferase, alkaline phosphatase), total bilirubin levels, sex, HCV viral loads, sodium level, HCC, platelet levels, and tobacco use. Machine learning algorithms performed effectively for risk prediction and stratification of DAA treatment failure.
Read full abstract