Abstract 878: Enhancing single-cell RNA sequencing analysis in cancer research: A machine learning framework based on LightGBM for automated cell type annotation

Tsung Hsien Chuang,Eric Y Chuang,Mong-Hsun Tsai,Hsiang-Han Chen,Tzu-Pin Lu,Liang-Chuan Lai

doi:10.1158/1538-7445.am2024-878

Abstract

Abstract Single-cell RNA sequencing (scRNA-seq) has been widely used in cancer research to understand the complex gene expression diversity and cancer heterogeneity. However, manual annotation of cell types in the scRNA-seq pipeline is time-consuming and depends on the expertise of analyzers, which can significantly influence the results of downstream analyses. To address this problem, we proposed a novel machine learning framework utilizing the LightGBM model for automated and efficient cell-type annotation of scRNA-seq. Two independent scRNA-seq datasets of non-small cell lung cancer (NSCLC) downloaded from the Gene Expression Omnibus (GEO) were used to train and test our model. A standard procedure is applied to both scRNA-seq datasets for quality control and preprocessing, in which poor-quality cells with low gene expressions or high scores for cellular stress/death were excluded. In addition, Harmony is applied to mitigate batch effects in scRNA-seq that could cause variability due to non-biological factors in experiments. Nine different cell types, including endothelial, epithelial, fibroblast, macrophages, mast, plasma, pulmonary alveolar, B, and T cells, were manually labeled in the two datasets by the providers, which were also examined using gene markers corresponding to different cell types from PanglaoDB and DAVID. These manually labeled cell types were used as the ground truth for training and testing our model. In the training stage, the training dataset (containing 85,000 cells from 44 NSCLC samples) of scRNA-seq was used to train the LightGBM model with its high-variable genes. Then, the model would be evaluated using an independent test dataset (containing 8,000 cells from 18 NSCLC samples) by comparing the automatically predicted and manually labeled cell types. The training result showed that our model could successfully specify the nine different cell types, achieving an overall average accuracy, F1 score, and precision of 0.86 each respectively. In the independent dataset test, the model demonstrated good generalization, showing high predictive performance across all cell types, with an average accuracy, F1 score, and precision of 0.8, 0.78, and 0.8, respectively. Specific to the predictions in the test dataset, we found that some epithelial cells were mistakenly identified as other cell types. This might be because of the complex gene expression patterns exhibited by tumor epithelial cells, making accurate predictions challenging. The proposed machine learning framework facilitates cell labeling and unravels the intricate heterogeneity within lung cancer datasets. The combination of LightGBM and standardized preprocessing establishes a benchmark for high-throughput, accurate single-cell analysis, paving the way for discoveries that are more targeted and have significant clinical impact. Citation Format: Tsung Hsien Chuang, Liang-Chuan Lai, Tzu-Pin Lu, Mong-Hsun Tsai, Hsiang-Han Chen, Eric Y. Chuang. Enhancing single-cell RNA sequencing analysis in cancer research: A machine learning framework based on LightGBM for automated cell type annotation [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 878.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Abstract 878: Enhancing single-cell RNA sequencing analysis in cancer research: A machine learning framework based on LightGBM for automated cell type annotation

Abstract

Talk to us

Similar Papers

More From: Cancer Research

Lead the way for us

Similar Papers

Statistical Nuclear Texture Analysis in Cancer Research: A Review of Methods and Applications
Birgitte Nielsen ... Fritz Albregtsen
Critical Reviews™ in Oncogenesis | VOL. 14
Birgitte Nielsen, et. al.Birgitte Nielsen ... Fritz Albregtsen
01 Jan 2008
Critical Reviews™ in Oncogenesis | VOL. 14

Sparse principal component analysis in cancer research.
... Dung-Tsa Chen
Translational cancer research | VOL. 3
, et. al. ... Dung-Tsa Chen
20 Jun 2014
Translational cancer research | VOL. 3

Recent Advances in Single-Cell Metabolomics Based on Mass Spectrometry
Qinlei Liu ... Renato Zenobi
CCS Chemistry | VOL. 5
Qinlei Liu, et. al.Qinlei Liu ... Renato Zenobi
22 Oct 2022
CCS Chemistry | VOL. 5

Droplet microfluidics for single-molecule and single-cell analysis in cancer research, diagnosis and therapy
Dong-Ku Kang ... Weian Zhao
TrAC Trends in Analytical Chemistry | VOL. 58
Dong-Ku Kang, et. al.Dong-Ku Kang ... Weian Zhao
05 Apr 2014
TrAC Trends in Analytical Chemistry | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Abstract 878: Enhancing single-cell RNA sequencing analysis in cancer research: A machine learning framework based on LightGBM for automated cell type annotation

Abstract

Talk to us

Similar Papers

More From: Cancer Research