Abstract
As single-cell RNA sequencing (scRNA-seq) has emerged as a great tool for studying cellular heterogeneity within the past decade, the number of available scRNA-seq datasets also rapidly increased. However, reuse of such data is often problematic due to a small cohort size, limited cell types, and insufficient information on cell type classification. Here, we present a large integrated scRNA-seq dataset containing 224,611 cells from human primary non-small cell lung cancer (NSCLC) tumors. Using publicly available resources, we pre-processed and integrated seven independent scRNA-seq datasets using an anchor-based approach, with five datasets utilized as reference and the remaining two, as validation. We created two levels of annotation based on cell type-specific markers conserved across the datasets. To demonstrate usability of the integrated dataset, we created annotation predictions for the two validation datasets using our integrated reference. Additionally, we conducted a trajectory analysis on subsets of T cells and lung cancer cells. This integrated data may serve as a resource for studying NSCLC transcriptome at the single cell level.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.