Genome Sequencing Datasets Research Articles

Abstract Purpose of the study Elucidating evolutionary trajectories of cancers allows us to understand the key events, and the order in which they occur, throughout their development. This can help us to find important associations with tumor progression and prognosis. Our aim was to perform de novo identification of the evolutionary trajectories within Sherlock-lung, with a dataset containing the largest collection of lung cancer in never smokers (LCINS) samples ever analyzed. Experimental procedures Our Plackett-Luce ordering model utilized copy number data from Battenberg and mutation cancer cell fraction (CCF) data from DPClust. Frequently-occurring copy number events and driver mutations are ordered within each sample using their copy number states and CCFs. An aggregate ordering is then calculated for a sample set. Mixture model analysis identifies subsets of samples displaying distinct orders of events, uncovering diverse evolutionary trajectories within a tumor set. Dataset The Sherlock-lung whole genome sequencing dataset (n=1217) was filtered to the samples that allowed us to identify subclonal expansions. Samples required at least 10 reads per chromosome copy and a minimum cellularity of 30%. This provided 458 LCINS samples of various histologies. 155 smoker samples were also analyzed for comparison. Results We identified two subsets of LCINS tumors following distinct evolutionary trajectories. The “loss-based” subset commonly saw whole genome duplication (WGD) combined with copy number losses occurring earlier, and at higher prevalence, than gains. Contrastingly, in the “gain-based” subset, WGD was relatively rare but ploidy increased via copy number gains, which were more prevalent than losses. Interestingly, these different trajectories converged on similar overall copy number states. The loss-based subset had a higher mutational burden and a higher proportion of the genome altered, and followed a more smoker-like trajectory than the gain-based subset. Considering these differences alongside the convergence in copy number states, it is intriguing that survival times were similar between the two subsets. Copy number events defined the difference between the two trajectories. However, driver mutations also played important roles in tumor evolution in LCINS. TP53 and EGFR mutations were associated with greater genomic instability. Conversely, KRAS mutations were associated with more stable genomes. Samples with early clonal mutations in TP53, ERBB2, and PIK3CA, as well as those with a copy number gain of ERBB2, exhibited shorter survival times. Conclusions Two distinct evolutionary trajectories of LCINS were identified by de novo Plackett-Luce event ordering analysis. The contrast between the subgroups was defined by different paths of copy number activity, but they ultimately converged on similar overall copy number states and outcomes. Key early driver mutations influenced genomic instability and survival times. Citation Format: Christopher Wirth, Tongwu Zhang, Wei Zhao, Phuc Hoang, Jian Sang, Nathaniel Rothman, Marcos Díaz-Gay, Ruxandra Teslo, Naser Ansari-Pour, Máire Ní Leathlobhair, Iliana Peneva, William Eagles, Lixing Yang, Ludmil Alexandrov, David C. Wedge, Maria Teresa Landi. Evolutionary trajectories of lung cancer in never smokers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl):Abstract nr LB228.

Read full abstract

Abstract Gastrointestinal (GI) cancers are among the most prevalent cancers affecting the US population. Anatomically, GI cancer includes cancers of the organs in the digestive tract, from the esophagus to the rectum. Advances in molecular oncology have started to transform the therapeutic landscape and offer tremendous promise for patients across diverse lineages. However, the equitable benefit of this approach has been hampered due to tumor heterogeneity and the lack of accurate, tumor-agnostic biomarkers with prognosis and predictive utility. Thus, there is a need to identify novel biomarkers that affect gastrointestinal malignancies to improve the management of cancer patients. This study analyzed a single institution's whole genome sequencing dataset to explore the genomic variation in GI cancers. Between 2020 and 2022, 302 AU patients underwent molecular profiling, including whole-exome sequencing (WES) profiling at Caris Life Sciences. WES is an NGS assay that analyzes the DNA sequences of all protein-coding exons in the genome, representing approximately 1-2% of the human genome with coverage of over 22,000 genes. We analyzed the TCGA dataset for comparative analysis, which included cancer data from &gt;1800 GI cancer patients. In addition, survival and network analyses were performed to identify a 6-gene FAT3-related signature for GI cancers. In our cohort, we identified FAT3 (Fat atypical cadherin 3) as the most frequently mutated gene after TP53, APC, and KRAS. FAT3 encodes a cadherin protein involved in cell-cell interactions and adhesion. In the institutional cohort, FAT3 was found to be mutated in 16% of all GI cases. Further analysis of TCGA datasets comprising 1808 GI cancer patients revealed FAT3 mutations in 12% of the cases. To identify additional prognostic biomarkers associated with FAT3, we performed network analysis and identified a 6-gene FAT3-related signature (FAT3, RYK, FAT2, EGFLAM, NTRK3, IGSF9, and HMGA2) that significantly stratified GI cancer patients based on overall survival, progression-free survival, and disease-specific survival. The perturbation profile of the 6-gene signature was associated with 509 patients or 28% of total GI patients. Further, immune deconvolution analysis of stratified patients revealed increased infiltration of immune cells with immunosuppressive phenotypic properties in GI cancer patients with higher expression of the FAT3-related gene signature. In summary, this analysis reveals the distribution of FAT3 mutational profiles in GI patients and identifies a gene signature that stratifies patients based on survival outcomes. This analysis can provide new tools for patient stratification and therapy implementation, leading to better outcomes for cancer patients. Citation Format: Pankaj Kumar Ahluwalia, Tiffanie Leeman, Ashis Mondal, Ashutosh Vashisht, Harmanpreet Singh, Ravindra Kolhe. Comprehensive profiling of the FAT3-associated gene signature in deciphering immunophenotypes of gastrointestinal cancers: Analysis of institutional cohort and TCGA dataset [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 1761.

Read full abstract

Genome Sequencing Datasets Research Articles

Related Topics

Articles published on Genome Sequencing Datasets

A mapping-free natural language processing-based technique for sequence search in nanopore long-reads.

Linkage Disequilibrium-Informed Deep Learning Framework to Identify Genetic Loci for Alzheimer's Disease Using Whole Genome Sequencing Data.

A comprehensive atlas of nuclear sequences of mitochondrial origin (NUMT) inserted into the pig genome

Identification of 16 novel Alzheimer's disease susceptibility loci using multi-ancestry meta-analyses of clinical Alzheimer's disease and AD-by-proxy cases from four whole genome sequencing datasets.

Pediatric Chordoma: A Tale of Two Genomes.

Copy number variation introduced by a massive mobile element facilitates global thermal adaptation in a fungal wheat pathogen

Diagnosing missed cases of spinal muscular atrophy in genome, exome, and panel sequencing datasets.

The chloroplast genome sequences of Ipomoea alba and I. obscura (Convolvulaceae): genome comparison and phylogenetic analysis

HGG-03. TRUNCATING MUTATIONS IN PPM1D COOPERATE WITH PI3K ALTERATIONS TO DRIVE PEDIATRIC DIFFUSE MIDLINE GLIOMAS

VSNP: a SNP pipeline for the generation of transparent SNP matrices and phylogenetic trees from whole genome sequencing data sets

#3020 A novel copy number analysis identifies human patients with NPHP1 whole gene deletions in previously genetically unsolved cases

A GGC-repeat expansion in ZFHX3 encoding polyglycine causes spinocerebellar ataxia type 4 and impairs autophagy.

Detection of rare variants among nuclei populating the arbuscular mycorrhizal fungal model species Rhizophagus irregularis DAOM197198.

Abstract LB228: Evolutionary trajectories of lung cancer in never smokers

Abstract 3501: Concurrent amplification of TBXT and highly activated enhancers in extrachromosomal DNA (ecDNA) drives chordoma tumorigenesis

Abstract 1761: Comprehensive profiling of the FAT3-associated gene signature in deciphering immunophenotypes of gastrointestinal cancers: Analysis of institutional cohort and TCGA dataset

Untranslated regions (UTRs) are a potential novel source of neoantigens for personalised immunotherapy.

A New Cloud-Native Tool for Pharmacogenetic Analysis.

Impact of dietary fiber on gut microbiota composition, function and gut-brain-modules in healthy adults - a systematic review protocol.

Whole Genome Sequence Dataset of Mycobacterium tuberculosis Strains from Patients of Campania Region

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Genome Sequencing Datasets Research Articles

Related Topics

Articles published on Genome Sequencing Datasets

A mapping-free natural language processing-based technique for sequence search in nanopore long-reads.

Linkage Disequilibrium-Informed Deep Learning Framework to Identify Genetic Loci for Alzheimer's Disease Using Whole Genome Sequencing Data.

A comprehensive atlas of nuclear sequences of mitochondrial origin (NUMT) inserted into the pig genome

Identification of 16 novel Alzheimer's disease susceptibility loci using multi-ancestry meta-analyses of clinical Alzheimer's disease and AD-by-proxy cases from four whole genome sequencing datasets.

Pediatric Chordoma: A Tale of Two Genomes.

Copy number variation introduced by a massive mobile element facilitates global thermal adaptation in a fungal wheat pathogen

Diagnosing missed cases of spinal muscular atrophy in genome, exome, and panel sequencing datasets.

The chloroplast genome sequences of Ipomoea alba and I. obscura (Convolvulaceae): genome comparison and phylogenetic analysis

HGG-03. TRUNCATING MUTATIONS IN PPM1D COOPERATE WITH PI3K ALTERATIONS TO DRIVE PEDIATRIC DIFFUSE MIDLINE GLIOMAS

VSNP: a SNP pipeline for the generation of transparent SNP matrices and phylogenetic trees from whole genome sequencing data sets

#3020 A novel copy number analysis identifies human patients with NPHP1 whole gene deletions in previously genetically unsolved cases

A GGC-repeat expansion in ZFHX3 encoding polyglycine causes spinocerebellar ataxia type 4 and impairs autophagy.

Detection of rare variants among nuclei populating the arbuscular mycorrhizal fungal model species Rhizophagus irregularis DAOM197198.

Abstract LB228: Evolutionary trajectories of lung cancer in never smokers

Abstract 3501: Concurrent amplification of TBXT and highly activated enhancers in extrachromosomal DNA (ecDNA) drives chordoma tumorigenesis

Abstract 1761: Comprehensive profiling of the FAT3-associated gene signature in deciphering immunophenotypes of gastrointestinal cancers: Analysis of institutional cohort and TCGA dataset

Untranslated regions (UTRs) are a potential novel source of neoantigens for personalised immunotherapy.

A New Cloud-Native Tool for Pharmacogenetic Analysis.

Impact of dietary fiber on gut microbiota composition, function and gut-brain-modules in healthy adults - a systematic review protocol.

Whole Genome Sequence Dataset of Mycobacterium tuberculosis Strains from Patients of Campania Region