Deep venous thrombosis is a critical medical condition that occurs when a blood clot forms in a deep vein, usually in the legs, and can lead to life-threatening complications such as pulmonary embolism if not detected early. Hospitalized patients, especially those with immobility or post-surgical recovery, are at higher risk of developing deep venous thrombosis, making early prediction and intervention vital for preventing severe outcomes. In this study, we evaluated the following eight machine learning models to predict deep venous thrombosis risk: logistic regression, random forest, XGBoost, artificial neural networks, k-nearest neighbors, gradient boosting, CatBoost, and LightGBM. These models were rigorously tested using key metrics, including accuracy, precision, recall, F1-score, specificity, and receiver operating characteristic curve, to determine their effectiveness in clinical prediction. Logistic regression emerged as the top-performing model, delivering high accuracy and an outstanding receiver operating characteristic curve score, which reflects its strong ability to distinguish between patients with and without deep venous thrombosis. Most importantly, the model’s high recall underscores its ability to identify nearly all true deep venous thrombosis cases, significantly reducing the risk of false negatives—a critical concern in clinical settings, where delayed or missed diagnoses can result in life-threatening complications. Although models such as random forest and eXtreme Gradient Boosting also demonstrated competitive performances, logistic regression proved the most reliable across all metrics. These results suggest that machine learning models, particularly logistic regression, have great potential for early deep venous thrombosis detection, enabling timely clinical interventions and improved patient outcomes.
Read full abstract