Abstract

Independent Component Analysis is a matrix factorization method for data dimension reduction. ICA has been widely applied for the analysis of transcriptomic data for blind separation of biological, environmental, and technical factors affecting gene expression. The study aimed to analyze the publicly available esophageal cancer data using the ICA for identification and comprehensive analysis of reproducible signaling pathways and molecular signatures involved in this cancer type. In this study, four independent esophageal cancer transcriptomic datasets from GEO databases were used. A bioinformatics tool « BiODICA—Independent Component Analysis of Big Omics Data» was applied to compute independent components (ICs). Gene Set Enrichment Analysis (GSEA) and ToppGene uncovered the most significantly enriched pathways. Construction and visualization of gene networks and graphs were performed using the Cytoscape, and HPRD database. The correlation graph between decompositions into 30 ICs was built with absolute correlation values exceeding 0.3. Clusters of components—pseudocliques were observed in the structure of the correlation graph. The top 1,000 most contributing genes of each ICs in the pseudocliques were mapped to the PPI network to construct associated signaling pathways. Some cliques were composed of densely interconnected nodes and included components common to most cancer types (such as cell cycle and extracellular matrix signals), while others were specific to EC. The results of this investigation may reveal potential biomarkers of esophageal carcinogenesis, functional subsystems dysregulated in the tumor cells, and be helpful in predicting the early development of a tumor.

Highlights

  • Investigation of cancer profiles is one of the largest sources of genomic and transcriptomic research data

  • Obtained results present the molecular pathways derived from four esophageal transcriptomic datasets

  • We focused on the deconvolution of gene expression profiles into independent components and combined those results with Gene Set Enrichment Analysis (GSEA) and ToppGene enrichment analysis

Read more

Summary

Introduction

Investigation of cancer profiles is one of the largest sources of genomic and transcriptomic research data. Data has been continuously generated and collected with the advancement of data computing methods and information technology. While databases represent significant resources for a vast amount of cancer genomics studies, complex issues and challenges remain in obtaining the maximum of useful for understanding cancer biology and use in clinics information from these data (Sudhagar et al, 2018). Methods of analysis that are existing today, especially in genomic studies, generate an excessively large number of different variables. In such cases, “unsupervised learning” methods are used utilizing the technique of reducing the dimensionality of data to reduce the multidimensionality of transcriptome data and highlight significant patterns of expression

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call