Understanding the heterogeneity of a population at risk is an important step in the early detection of gastric cancer. We aimed to cluster demographic, hematological, and biochemical markers of gastric cancer in a heterogeneous sample of patients. Data of 695 adult patients (50.0% women) who were diagnosed with histologically confirmed gastric cancer, benign gastric disease, or identified as healthy individuals (December 2018 to August 2019, Hangzhou, China) were analyzed. We conducted hierarchical clustering using a factorial analysis of mixed data. To assess the clustering scheme, we also developed a machine-learning classification model using the Extreme Gradient Boosting algorithm and subsequently ranked the variables for differentiating patient phenotypes. Three clusters were identified using patient characteristics. The classification model showed high performance (multi-class AUC = 0.921) for recognizing the clusters. The top five important variables in differentiating the clusters were sex, hemoglobin, albumin, creatinine, and high-density lipoprotein (all ANOVA P <0.001) in decreasing order of importance. The prevalence of gastric cancer in clusters I, II, and III was 95.8%, 53.8%, and 34%, respectively [χ2(2) = 164.050, P <0.001]. Cluster I (N = 167) predominantly had an inflammatory profile, Cluster II (N = 240) showed metabolic disturbances, and Cluster III (N = 288) presented a relatively favorable metabolic and inflammatory profile. There were distinct clinical phenotypes in the population, each with varying prevalence of gastric cancer. A combination of routine clinical data outperformed carbohydrate or carcinoembryonic antigens in capturing the heterogeneity of the population regarding gastric pathologies.
Read full abstract