Leaf Nodes Research Articles

Abstract Single-cell technologies represent a revolutionary approach to resolving cell-type heterogeneity, identifying cells in specialized states, and detecting rare disease-associated cells. With the cost of single-cell technology decreasing substantially, its integration into clinical studies is gaining momentum. A new computational tool is needed to accommodate different single-cell genomics and clinical data formats while accounting for unwanted confounders. The study aims to develop a tree-based machine learning model to leverage the unprecedented resolution of single-cell multi-omics data for delineating the genomic and phenotypic drivers behind diverse immunotherapy responses. The proposed model is called single-cell analysis of Clinical Tree (scanCT), inspired by the Generalized Unbiased Interaction Detection and Estimation method for unbiased gene and protein feature selection and easy interpretation. The scanCT model learns from the data to select the genomic feature that best splits the cells from distinct clinical responses for each tree node. The confounding factors will be regressors in the nodes but not be used for branch splitting, while gene and protein features of interest will split the tree but not enter the regression model in each node. scanCT is built to be free from the biased selection towards variables of a larger number of categories or values. With tree-pruning and cross-validation, scanCT overcomes the over-fitting issue and enhances model generalization, especially for clinical studies with limited patients. Particularly, scanCT naturally fits the hierarchical cell type relationship and handles marker gene and protein interaction effects efficiently. Our approach was tested on single-cell datasets from B-cell malignancy patients undergoing Chimeric Antigen Receptor (CAR)-T cell therapy. The results from the scanCT are highly interpretable. For instance, each branch is a gene-protein combination profile, and cells are naturally partitioned by clinical association. The linear regressions at each leaf node are the clinical predictions for cells following the splitting criteria. The regression intercept is an average estimation of toxicity (e.g., neurotoxicity) or efficacy after controlling for confounder (e.g., tumor burden). scanCT accommodates categorical or continuous clinical response and survival data and is robust to missing values, a frequent challenge in oncological studies. scanCT represents a significant step forward in single-cell data analysis, which merges complex genotypic and phenotypic information with clinical outcomes. The efficacy and toxicity-associated genomic signatures will inform new manufacturing strategies to optimize CAR-T cell therapy products. The model and clinical association detections are expected to go beyond the B-cell malignancy field to benefit the broader cancer research community. Citation Format: Ye Zheng, Long Nguyen, Peigen Zhou, Alexandre V. Hirayama. ScanCT: A tree-based machine learning model to detect single-cell genomic features associated with clinical outcomes [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7352.

Abstract Southwest Asian/North African communities (SWANA) make up over 3-4% of immigrants in the U.S. and yet their health status is largely unknown because these ethnic groups are misclassified within the U.S. racial schema as White, deeming them ‘invisible minorities’. Administrative forms specify that White includes Middle-Eastern but SWANA persons may also self-identify as Black, Asian, and Other. With the rise of Islamophobia and increased US intervention in the Middle Eastern region, SWANA Americans face unique challenges that require a deeper understanding of their health status.One methodology to obtain cancer statistics on SWANA is using naming algorithms. Similar to SWANA, the Latine population was invisible in administrative data prior to the 1970’s. Grassroots efforts and advocacy from the Latine community led to the development of validated Latine surname algorithms which have been implemented by the National Cancer Institute. Similarly, SWANA activists have advocated for the creation of a federal identification category for over 50 years arguing that SWANA communities are not perceived as White due, in large part, to a long-standing history of political racism in the United States.The purpose of this study was to develop a SWANA Surname Algorithm (SSA) to inclusively identify SWANA in cancer health data. We used surnames by country of descent to leverage interpretable decision trees to effectively distinguish SWANA from non-SWANA individuals by iteratively selecting the best surname roots at which to split the data to maximize the separation of SWANA individuals from others based on their surname. We integrated these patterns into our SSA so that when presented with a new surname, the algorithm simply follows the decision patterns down to the leaf nodes, otherwise known as the predicted class (SWANA vs non-SWANA).We developed a preliminary SWANA Surname List (SSL) using publicly available naming databases by country of origin (N=71,300). We cross-referenced the SSL against the VCU Massey Cancer Center data repository and found 4.9% of all cancer patients from 2016 to 2020 matched as SWANA. Notably, the prevalence of SWANA patients has been increasing over the last few decades, 3.8% in 1991-1995, to 4.2% in 2001-2005, and then most recently 4.9% in 2016-2020. We will use our SSA to validate these findings. These preliminary findings underscore the valuable insights that naming algorithms can provide in elucidating the true demographic composition of cancer patients. Lack of racial/ethnic disaggregation perpetuates existing inequities in access to essential health resources among SWANA communities. The inclusion of SWANA in cancer disparities research would allow researchers to better examine the cancer health status of this underrepresented but growing community while also aligning with the true racialization of SWANA in the United States. Citation Format: Guleer Shahab, Michael Preston. Decolonizing data: Diversifying cancer registries to include SWANA [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 820.

Leaf Nodes Research Articles

Related Topics

Articles published on Leaf Nodes

Abstract 7352: ScanCT: A tree-based machine learning model to detect single-cell genomic features associated with clinical outcomes

Abstract 820: Decolonizing data: Diversifying cancer registries to include SWANA

Counting Rules for Computing the Number of Independent Sets of a Grid Graph

Effects of uncommon non-isochronicities on remote synchronization

Hierarchical matrix factorization for interpretable collaborative filtering

Distributed Cooperative Driving Strategy for Connected Automated Vehicles at Unsignalized Intersections Based on Monte Carlo Method

Weighted omnidirectional semi-global stereo matching

High-risk sexual behaviors of HIV/AIDS and related factors in young students in Guangzhou

A dynamic approach for visualizing and exploring concept hierarchies from textbooks.

DendroX: multi-level multi-cluster selection in dendrograms

Subspace learning machine (SLM): Methodology and performance evaluation

Overexpression of black rice OsC1 confers tissue-specific anthocyanin accumulation in indica rice cv. Kasalath and its potential use as a visible marker in rice transformation

Pulsed sounds caused by internal oxygen transport during photosynthesis in the seagrass Halophila ovalis.

Two-level optimization by differential evolution in decision tree learning algorithm

Computer Go Research Based on Variable Scale Training and PUB-PMCTS

Research on the Optimization of Enterprise Resource Economic Benefits and Management Costs in Cloud Computing Environment

AN IMPROVED INDEXING METHOD FOR QUERYING BIG XML FILES

CareerMiner: Automatic extraction of professional network from large Chinese resume data

Partial remote synchronization in star-like networks with partial connections among leaf nodes

Modified graph-based algorithm to analyze security threats in IoT.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Leaf Nodes Research Articles

Related Topics

Articles published on Leaf Nodes

Abstract 7352: ScanCT: A tree-based machine learning model to detect single-cell genomic features associated with clinical outcomes

Abstract 820: Decolonizing data: Diversifying cancer registries to include SWANA

Counting Rules for Computing the Number of Independent Sets of a Grid Graph

Effects of uncommon non-isochronicities on remote synchronization

Hierarchical matrix factorization for interpretable collaborative filtering

Distributed Cooperative Driving Strategy for Connected Automated Vehicles at Unsignalized Intersections Based on Monte Carlo Method

Weighted omnidirectional semi-global stereo matching

High-risk sexual behaviors of HIV/AIDS and related factors in young students in Guangzhou

A dynamic approach for visualizing and exploring concept hierarchies from textbooks.

DendroX: multi-level multi-cluster selection in dendrograms

Subspace learning machine (SLM): Methodology and performance evaluation

Overexpression of black rice OsC1 confers tissue-specific anthocyanin accumulation in indica rice cv. Kasalath and its potential use as a visible marker in rice transformation

Pulsed sounds caused by internal oxygen transport during photosynthesis in the seagrass Halophila ovalis.

Two-level optimization by differential evolution in decision tree learning algorithm

Computer Go Research Based on Variable Scale Training and PUB-PMCTS

Research on the Optimization of Enterprise Resource Economic Benefits and Management Costs in Cloud Computing Environment

AN IMPROVED INDEXING METHOD FOR QUERYING BIG XML FILES

CareerMiner: Automatic extraction of professional network from large Chinese resume data

Partial remote synchronization in star-like networks with partial connections among leaf nodes

Modified graph-based algorithm to analyze security threats in IoT.