Abstract Purpose An overarching goal of this study is to reduce disparities in lung cancer research commonly manifested by racial and/or gender under-representations. By employing the Gerchberg-Saxton (GS) algorithm, we aim to diminish data noise/bias in single-cell RNA sequencing (scRNA-seq) gene expression data, thereby enabling more equitable research outcomes. An immediate goal is to remove extensive data bias associated with scRNA-seq data thus enhancing downstream Machine Learning analyses. Background: We have successfully applied the GS algorithm for fairer mortality rate predictions across different racial groups [1]. In this study, we seek to apply the GS algorithm to the scRNA-seq data that interrogate the cellular landscapes of lung cancer and the tumor microenvironment. Methodology: The application of the GS algorithm to single cell RNA data involved a series of steps. Initially, all data frames were transposed, switching columns and rows to represent gene expressions and single cells, respectively. The algorithm was then meticulously applied to each column, allowing for the uniform distribution of certain specific gene expression information across all single cells within the data frame. This column-wise application was crucial to maintain the integrity of individual cell characteristics while ensuring a fair structural distribution of gene expression data. The process aimed to balance the representation of genes across all cells, addressing inherent biases and noises in the original data structure. Data: The study utilizes a previously unpublished scRNA-seq dataset comprising 14 lung cancer patients from Wake Forest Baptist Comprehensive Cancer Center, including 6 African American and 8 Caucasian patients. Results: One out of 14 preliminary clustering results are presented in Figure 2 using ScanPy [2]. The initial findings are based on a restricted set of cell marker genes, which will be further developed with additional markers in subsequent analyses. With the application of the Gerchberg-Saxton algorithm, the Shannon Entropy [3] analysis revealed a more uniform randomness across gene expressions (Figure 1), and the unsupervised clustering indicated a clearer separation of cell types, significantly enhancing the ability for downstream analyses. The data transformation will be applied to the whole dataset for comparative analyses of cellular landscapes between lung cancer from different races and genders, to be reported at the AACR 2024 Annual Meeting. Conclusions: Our preliminary studies showed that the Gerchberg-Saxton algorithm is effectiveness in normalizing data distribution for scRNA-seq data, which has led to enhanced resolution of cell type differentiations in clustering analysis. With this refined methodology, we are better poised to better address lung cancer health disparities revealed by single cell sequencing analysis. Citation Format: Seha Ay, Liang Liu, Elizabeth Forbes, Umit Topaloglu, Wei Zhang. Understanding disparities in lung cancer using single cell RNA sequencing data transformed by the Gerchberg Saxton algorithm [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6126.
Read full abstract