Abstract

This study aims to create a tumor heterogeneity-based model for predicting the best features of lung adenocarcinoma (LUAD) in multiple cancer subtypes using the Least Absolute Shrinkage and Selection Operator (LASSO). The RNASeq data of 533 LUAD cancer samples were downloaded from the TCGA database. Subsequent to the identification of differentially expressed genes (DEGs), the samples were divided into two subtypes based on the consensus clustering method. The subtypes were estimated with the abundance of immune and non-immune stromal cell populations which infiltrated tissue. LASSO model was established to predict each subtype's best genes. Enrichment pathway analysis was then carried out. Finally, the validity of the LUSC model for identifying features was established by the survival analysis.89 and 444 samples were clustered in Subtype-1 and Subtype-2 groups respectively. DEG analysis was performed on each subtype. A standard cutoff was applied and in total, 2033 genes were upregulated and 505 were downregulated in case of subtype-1 and 5039 genes were upregulated and 1219 were downregulated in case of subtype-2. LASSO model was established to predict the best features from each subtypes, 40 and 43 most relevant genes were selected in subtype-1 and subtype-2. The abundance of tissue-infiltrates analysis distinguished the subtypes based on the expression pattern of immune infiltrates. Survival analysis showed that this model could effectively predict the best and distinct features in cancer subtypes. The study suggests that unsupervised clustering and Machine learning methods such as LASSO model-based feature selection can be effectively used to predict relevant genes which might play an essential role in cancer diagnosis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call