Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cell carcinoma (RCC). Due to the lack of symptoms until advanced stages, early diagnosis of ccRCC is challenging. Therefore, the identification of novel secreted biomarkers for the early detection of ccRCC is urgently needed. This study aimed to identify novel secreted biomarkers for diagnosing ccRCC using bioinformatics and machine learning techniques based on transcriptomics data. Differentially expressed genes (DEGs) in ccRCC compared to normal kidney tissues were identified using 3 transcriptomics datasets (GSE53757, GSE40435 and GSE11151) from the Gene Expression Omnibus (GEO). Potential secreted biomarkers were examined within these common DEGs using a list of human secretome proteins from The Human Protein Atlas. The recursive feature elimination (RFE) technique was used to determine the optimal number of features for building classification machine learning models. The expression levels and clinical associations of candidate biomarkers identified with RFE were validated using transcriptomics data from The Cancer Genome Atlas (TCGA). Classification models were then developed based on the expression levels of these candidate biomarkers. The performance of the models was evaluated based on accuracy, evaluation metrics, confusion matrices, and ROC-AUC (receiver operating characteristic-area under the ROC curve) curves. We identified 44 DEGs that encode potential secreted proteins from 274 common DEGs found across all datasets. Among these, insulin-like growth factor binding protein 3 (IGFBP3) and lectin, galactoside-binding, soluble, 1 (LGALS1) were selected for further analysis using the RFE technique. Both IGFBP3 and LGALS1 showed significant upregulation in ccRCC tissues compared to normal tissues in the GEO and TCGA datasets. The results of the survival analysis indicated that patients with higher expression levels of these genes exhibited shorter overall and disease-free survival times (OS and DFS). Decision tree and random forest models based on IGFBP3 and LGALS1 levels achieved an accuracy of 98.04% and an AUC of 0.98. This study identified IGFBP3 and LGALS1 as promising novel secreted biomarkers for ccRCC diagnosis.
Read full abstract