e13612 Background: Healthcare costs in the United States (US) are significantly impacted by genitourinary (GU) cancers. Prostate cancer treatment alone cost the US more than $22 billion in 2020. We sought to utilize machine learning (ML)-based classification to investigate differences in the total cost of healthcare expenditure among patients with GU cancer in the US. Methods: The study included eligible adult GU cases from the 2019–2021 Medical Expenditure Panel Survey. Renal cell, prostate, urothelial, testicular, and penile cancers were identified using ICD-10 CM codes. Patients were clustered using demographic information such as age, race, sex, marital status, highest degree, insurance status, perceived health status, and perceived mental health status. The analysis accounted for the complex survey design and sampling weights. The cluster groups were taken into account when conducting descriptive and regression analyses. GU cancer cases were divided into low-, medium-, and high-risk healthcare expenditure clusters. Results: The study included a representative sample of 9,061,181 cases of GU cancer in the US between 2019 and 2021. The average age of the sample was 71.6 years (95% CI: 70.7, 72.5). There were 7.9% Hispanic (Hisp), 74.8% non-Hispanic White (NHW), 13.8% non-Hispanic Black (NHB), and 3.5% individuals from Asian or other races. Overall, the mean total healthcare expenditures within low-, med-, and high-risk clusters were $17974 (13816, 22133), $20496 (13100, 27893), and $30512 (21389, 39635), respectively. Regression analysis showed that, after adjusting for sampling weight, cluster, gender, family income, age, insurance status, and survey year, the mean total expenditure was higher by $11209.70 and $13100.81 in med- and high-risk clusters when compared to the low-risk cluster (ref.). Within each race or ethnicity, the proportion of Hisp or NHB individuals was higher in med- and high-risk clusters. Med- and high-risk clusters were more likely to contain people with cognitive impairments, poor health status, and financial difficulties in their families. Conclusions: We used an unsupervised clustering algorithm that can distinguish between GU cases with high and low expenditure based solely on sociodemographic variables. We demonstrate that our findings were consistent for all three data cohorts. This approach can predict the disparity in utilization in any insured population. [Table: see text]
Read full abstract