Abstract Background- Cervical cancer, ranking fourth in prevalence among women worldwide, is highly preventable when diagnosed early. The advent of innovative technologies including bioinformatics and machine learning, is revolutionizing the discovery and development of novel biomarkers for cancer. Our study uniquely integrates artificial neural networks with RNA-sequencing data of cervical cancer patients. This combination enhances the accuracy and reliability of biomarker predictions, providing a better understanding of the potential clinical utility of the identified biomarkers. Methods- RNA-sequencing data and clinicopathological details of 304 Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) were obtained from the GDAC database (https://gdac.broadinstitute.org/). Differentially expressed genes (DEGs) were identified (P<0.05, |log2 fold change (FC)| > 1.5, false discovery rate (FDR) < 0.05). Pathway enrichment analysis was performed and protein-protein interactions of DEGs were constructed using the STRING database. Prognostic biomarkers were identified using Kaplan-Meier and Cox proportional hazard methods adjusted for potential confounders such as age, disease stage, and comorbidities (HR<1, p<0.05). Deep learning algorithms were applied for predictive marker identification, utilizing Weight by Correlation feature selection. The model used AUC (Area Under the Curve), accuracy, MSE (Mean Squared Error), and R2 (R-squared) as evaluation metrics in a 70-30 training-test split. A combined ROC curve assessed diagnostic biomarkers, validated externally with a GEO dataset on cervical cancer patients. Results- In our study, 4153 DEGs were identified. Pathway analysis revealed that key dysregulated genes play a pivotal role in extracellular matrix organization. The survival analysis identified that six upregulated genes (CT62, SLC7A5P1, C10orf110, FDPSL2A, TNFSF15, and TTLL13) and seven downregulated genes (BAI3, CBX7, DKFZp566F0947, PCDHB18, GRAPL, PCDHB19P, and C4orf38) decreased the overall survival in patients. The machine learning model demonstrated high predictive accuracy (AUC=1, accuracy=99.02%, R2=0.99), identifying twenty genes with a positive correlation to cervical cancer risk. Notably, CBX7 emerged as a prognostic and diagnostic biomarker (AUC=0.99, sensitivity=0.93, specificity=1.00). Conclusion- Our study uncovers the prognostic significance of two novel genes in cervical cancer: CBX7, a vital regulator of tumor suppression, and PCDHB18, a member of the protocadherin beta gene cluster functioning as a cell adhesion molecule. Downregulation of these genes is associated with decreased overall survival. Further functional analyses and validation of these candidate biomarkers are crucial to fully assess their potential clinical value in cervical cancer. Citation Format: Ghazaleh Pourali, Mohsen Zeinali, Mahshid Arastonejad, Nima Khalili-Tanha, Elham Nazari, Ghazaleh Khalili-Tanha, Adetunji T. Toriola. Identification of CBX7 and PCDHB18 as novel prognostic biomarkers of cervical cancer: RNA-sequencing and machine learning analysis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4930.
Read full abstract