Lung adenocarcinoma (LUAD) is one of the most widespread and fatal types of lung cancer. Oxidative stress, resulting from an imbalance in the production and accumulation of reactive oxygen species (ROS), is considered a promising therapeutic target for cancer treatment. Currently, immune checkpoint blockade (ICB) therapy is being explored as a potentially effective treatment for early-stage LUAD. In this research, we aim to identify distinct subtypes of LUAD patients by investigating genes associated with oxidative stress and immunotherapy. Additionally, we aim to propose subtype-specific therapeutic strategies. We conducted a thorough search of the Gene Expression Omnibus (GEO) datasets. From this search, we pinpointed datasets that contained both expression data and survival information. We selected genes associated with oxidative stress and immunotherapy using keyword searches on GeneCards. We then combined expression data of LUAD samples from both The Cancer Genome Atlas (TCGA) and 11 GEO datasets, forming a unified dataset. This dataset was subsequently divided into two subsets, Dataset_Training and Dataset_Testing, using a random bifurcation method, with each subset containing 50% of the data. We applied consensus clustering (CC) analysis to identify distinct LUAD subtypes within the Dataset_Training. Molecular variances associated with oxidative stress levels, the tumor microenvironment (TME), and immune checkpoint genes (ICGs) were then investigated among these subtypes. Employing feature selection combined with machine learning techniques, we constructed models that achieved the highest accuracy levels. We validated the identified subtypes and models from Dataset_Training using Dataset_Testing. A hub gene with the highest importance values in the machine learning model was identified. We then utilized virtual screening to discover potential compounds targeting this hub gene. In the unified dataset, we integrated 2,154 LUAD samples from TCGA-LUAD and 11 GEO datasets. We specifically selected 1,311 genes associated with immune and oxidative stress processes. The expression data of these genes were then employed for subtype identification through CC analysis. Within Dataset_Training, two distinct subtypes emerged, each marked by different levels of immune and oxidative stress pathway values. Consequently, we named these as the OX+ and IM+ subtypes. Notably, the OX+ subtype showed increased oxidative stress levels, correlating with a worse prognosis than the IM+ subtype. Conversely, the IM+ subtype demonstrated enhanced levels of immune pathways, immune cells, and ICGs compared to the OX+ subtype. We reconfirmed these findings in Dataset_Testing. Through gene selection, we identified an optimal combination of 12 genes for predicting LUAD subtypes: ACP1, AURKA, BIRC5, CYC1, GSTP1, HSPD1, HSPE1, MDH2, MRPL13, NDUFS1, SNRPD1, and SORD. Out of the four machine learning models we tested, the support vector machine (SVM) stood out, achieving the highest area under the curve (AUC) of 0.86 and an accuracy of 0.78 on Dataset_Testing. We focused on HSPE1, which was designated as the hub gene due to its paramount importance in the SVM model, and computed the docking structures for four compounds: ZINC3978005 (Dihydroergotamine), ZINC52955754 (Ergotamine), ZINC150588351 (Elbasvir), and ZINC242548690 (Digoxin). Our study identified two subtypes of LUAD patients based on oxidative stress and immunotherapy-related genes. Our findings provided subtype-specific therapeutic strategies.
Read full abstract