Abstract

Abstract Background/Introduction Cardiac amyloidosis (CA) is a rare and complex condition with poor prognosis. Novel therapies have been shown to improve outcome, however, most of the affected individuals remain undiagnosed, mainly due to a lack in awareness among clinicians. One approach to overcome this issue is to use automated diagnostic algorithms that act based on routinely available laboratory results. Purpose We tested the performance of flexible machine learning and traditional statistical prediction models for non-invasive CA diagnosis based on routinely collected laboratory parameters. Since laboratory routines vary between hospitals or other health care providers, special attention has been taken to adaptive and dynamic parameter selection, and to dealing with the frequent occurrence of missing values. Methods Our cohort consisted of 376 clinically accepted patients with various types of heart failure. Of these, 69 were diagnosed with CA via endomyocardial biopsy (positives), and 307 had unrelated cardiac disorders (negatives). A total of 63 routine laboratory parameters were collected from these patients, with a high incidence of missing values (on average 60% of patients for each parameter). We tested the performance of two prediction models: logistic regression, and extreme gradient boosting with regression trees. To deal with missing values we adopted two strategies: a) finding an optimal overlap of parameters and deleting all patients with missing values (reduction of parameters and samples), and b) retaining all features and imputing missing values with parameter-wise means. To fairly assess the performance of prediction models we employed a 10-fold cross validation (stratified to preserve sample class ratio). Finally, area under curve for receiver-operator characteristic (ROC AUC) was used as our final performance measure. Results A complex machine learning model based on forests of regression trees proved to be the most performant (ROC AUC 0.94±4%) and robust to missing values. The best regression model was obtained with the 25 most frequent variables and patient deletion in case of missing values (ROC AUC 0.82±0.8%). While progressive inclusion of predictor variables worsened the performance of the logistic regression, it increased that of the machine learning approach. Conclusions Extreme gradient boosting of regression trees by routine laboratory parameters achieved staggering accuracy results for the automated diagnosis of CA. Our data suggest that implementations of such algorithms as independent interpreters of routine laboratory results may help to establish or suggest the diagnosis of CA in patients with heart failure symptoms, even in the absence of specialized experts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call