BackgroundAn increasing proportion of lung adenocarcinoma (LUAD) occurs in patients even after they have stopped smoking. Here, we aimed to determine whether tobacco smoking induced changes across LUADs from patients who formerly smoked correspond to different biological and clinical factors.MethodsRandom forest models (RFs) were trained utilizing a smoking associated signature developed from differentially expressed genes between LUAD patients who had never smoked (NS) or currently smoked (CS) from TCGA (n = 193) and BCCA (n = 69) cohorts. The RFs were subsequently applied to 299 and 131 formerly smoking patients from TCGA and MSKCC cohorts, respectively. FS were RF-classified as either CS-like or NS-like and associations with patient characteristics, biological features, and clinical outcomes were determined.ResultsWe elucidated a 123 gene signature that robustly classified NS and CS in both RNA-seq (AUC = 0.85) and microarray (AUC = 0.92) validation test sets. The RF classified 213 patients who had formerly smoked as CS-like and 86 as NS-like from the TCGA cohort. CS-like and NS-like status in formerly smoking patients correlated poorly with patient characteristics but had substantially different biological features including tumor mutational burden, number of mutations, mutagenic signatures and immune cell populations. NS-like formerly smoking patients had 17.5 months and 18.6 months longer overall survival than CS-like patients from the TCGA and MSKCC cohorts, respectively.ConclusionsPatients who had formerly smoked with LUAD harbor heterogeneous tumor biology. These patients can be divided by smoking induced gene expression to inform prognosis and underlying biological characteristics for treatment selection.