Abstract Purpose: In the realm of cancer treatment and research, the analysis of genomic data stands as a cornerstone for diagnosis and therapy. Recognizing this, our study embarks on an ambitious journey to revolutionize how we interpret such data. By converting complex genomic data into a two-dimensional (2D) image format leveraging feature correlations, we pave the way for more profound insights and applications in clinical decision-making. This innovative approach utilizes 2D convolutional neural networks (CNNs) to analyze these structured images, promising a leap forward in the accuracy and utility of such data. Methodology: The core of our method involves a novel transformation of tabular genomic data into an image format. We start with m patient samples, each characterized by n genes, and calculate a pairwise correlation matrix to capture the intricate relationships between these genes. Concurrently, we generate an Euclidean distance matrix that represents the distances between n points in a 2D grid. The next step involves the optimization of the Gromov-Wasserstein discrepancy to align these two matrices, resulting in a transformation matrix, T. When applied to the tabular data, T transforms it into m distinct, informative images. These images are then processed using a 10-layer CNN, comprising a convolutional layer (3 × 3 kernel), three dense layers, two relu layers, and a dropout layer. Data Sources: We created images, termed 'genomaps', from genomic data collected from three diverse patient groups: 130 individuals (cancerous and normal) from Stanford Hospital, 230 breast cancer patients from The Cancer Genome Atlas (TCGA), and a larger group of 1572 patients from the Memorial Sloan Kettering (MSK) Cancer Center. The created genomaps are visually interpretable, offering clear differentiation between cancerous and non-cancerous samples, which is pivotal for accurate analysis. Outcomes: Our method demonstrates a remarkable 8% improvement in survival prediction accuracy in comparison to the existing methods. The employment of DeepSHAP in our analysis has allowed us to identify critical genes with greater precision than conventional methods. The p53 gene family, notably TP53 and TP63, was identified as significantly mutated in over 50% of all cancer types analyzed. Another critical discovery was the ERBB gene family, encompassing EGFR, ERBB2, ERBB3, ERBB4, which plays a dual role in tumor proliferation and influencing the immune response against tumors, a factor critical for immunotherapy strategies. Conclusion: This research introduces a transformative approach to represent genomic data, utilizing 2D CNNs for in-depth analysis. It sets a new benchmark in classification and regression accuracy, offering a more interpretable pathway for biomarker discovery, significantly contributing to the field of personalized medicine and cancer research. Citation Format: Md Tauhidul Islam, Lei Xing. Transforming genomic data into images for enhanced deep learning in precision oncology [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3532.
Read full abstract