Large-scale Genome Sequencing Research Articles

Interpretation of variants of uncertain significance (VUSs) remains a challenge in the care of patients with inherited cardiovascular diseases (CVDs); 56% of variants within CVD risk genes are VUS, and machine learning algorithms trained upon large data resources can stratify VUS into higher versus lower probability of contributing to a CVD phenotype. We used ClinVar pathogenic/likely pathogenic and benign/likely benign variants from 47 CVD genes to build a predictive model of variant pathogenicity utilizing measures of evolutionary constraint, deleteriousness, splicogenicity, local pathogenicity, cardiac-specific expression, and population allele frequency. Performance was validated using variants for which the ClinVar pathogenicity assignment changed. Functional validation was assessed using prior studies in >900 identified VUS. The model utility was demonstrated using the Catheterization Genetics cohort. We identified a top-ranked model that accurately prioritized variants for which ClinVar clinical significance had changed (n=663; precision-recall area under the curve, 0.97) and performed well compared with conventional in silico methods. This model (CVD pathogenicity predictor) also had high accuracy in prioritizing VUS with functional effects in vivo (precision-recall area under the curve, 0.58). In Catheterization Genetics, there was a greater burden of higher CVD pathogenicity predictor scored VUS in individuals with dilated cardiomyopathy compared with controls (P=8.2×10-15). Of individuals in Catheterization Genetics who harbored highly ranked CVD pathogenicity predictor VUS meeting clinical pathogenicity criteria, 27.6% had clinical evidence of disease. Variant prioritization using this model increased genetic diagnosis in Catheterization Genetics participants with a known clinical diagnosis of hypertrophic cardiomyopathy (7.8%-27.2%). We present a cardiac-specific model for prioritizing variants underlying CVD syndromes with high performance in discriminating the pathogenicity of VUS in CVD genes. Variant review and phenotyping of individuals carrying VUS of pathogenic interest support the clinical utility of this model. This model could also have utility in filtering variants as part of large-scale genomic sequencing studies.

Abstract Understanding the relationship between genotype and phenotype in breast cancer cells has been challenging at single cell resolution, mainly because existing high-throughput methods are limited to measuring a single modality and data must be integrated indirectly via computational methods. To address this challenge, we developed wellDR-seq, a high-throughput single cell method that can simultaneously measure the single cell whole genome and transcriptome directly from thousands of single cells in parallel. Using this method, we profiled 17,427 single cells in 6 different ER+ breast cancer patients with either premalignant disease (Ductal-carcinoma-in-situ) or invasive ductal carcinoma (IDC). From these data we identified the epithelial cell-of-origin in 3 cases, showing that ER+ breast cancer cells originated from Luminal Hormone Responsive (LumHR) in the normal breast tissues. We also found that autosomal somatic copy number aberrations were exclusively present in LumSec (&lt; 2%) and LumHR epithelial cells, while chrX gain or loss also occurred in stromal cells (eg fibroblast and endothelial) but a low frequency (&lt;2%) in our datasets. Additionally, our data show the impact of subclonal copy number profiles on gene expression programs, which reflects both gene dosage effects in CNA regions and more complex deregulation in copy-neutral regions. wellDR-seq offers a powerful tool for large-scale single cell genome and transcriptome simultaneous sequencing across different clinical samples, opens new avenue to investigate genotype and phenotypes interactions, identify tumor cells origins, reveal somatic copy number and mutation events, quantify the gene dosage effect and discerning differential genes expression between tumor subclones. Citation Format: Kaile Wang, Rui Ye, Shanshan Bai, Zhenna Xiao, Lei Yano, Jianzhuo Li, Yiyun Lin, Emi Sei, Steven Lin, Alastair Thompson, Savitri Krishnamurthy, Nicholas Navin. High throughput single cell simultaneous DNA and RNA sequencing identified cell-of-origin of breast cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6937.

Large-scale Genome Sequencing Research Articles

Related Topics

Articles published on Large-scale Genome Sequencing

Leveraging large-scale Mycobacterium tuberculosis whole genome sequence data to characterise drug-resistant mutations using machine learning and statistical approaches

Cardiovascular Disease Pathogenicity Predictor (CVD-PP): A Tissue-Specific In Silico Tool for Discriminating Pathogenicity of Variants of Unknown Significance in Cardiovascular Disease Genes.

Chinese Populations of Magnaporthe oryzae Serving as a Source of Human-Mediated Gene Flow to Asian Countries: A Population Genomic Analysis

Quest for Orthologs in the Era of Biodiversity Genomics.

Linkage Disequilibrium-Informed Deep Learning Framework to Identify Genetic Loci for Alzheimer's Disease Using Whole Genome Sequencing Data.

Large-scale genome sequencing of giant pandas improves the understanding of population structure and future conservation initiatives

How exome sequencing improves the diagnostics and management of men with non-syndromic infertility.

Chromatin remodellers as therapeutic targets.

Population structure and antibiotic resistance of swine extraintestinal pathogenic Escherichia coli from China

Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control.

COPO - Managing sample metadata for biodiversity: considerations from the Darwin Tree of Life project

Real-time genomic surveillance for enhanced control of infectious diseases and antimicrobial resistance

Dynamic analysis of SARS-CoV-2 evolution based on different countries

Species-aware DNA language models capture regulatory elements and their evolution

Abstract 6937: High throughput single cell simultaneous DNA and RNA sequencing identified cell-of-origin of breast cancer

A catalogue of chromosome counts for Phylum Nematoda

Mobilisation and analyses of publicly available SARS-CoV-2 data for pandemic responses.

A randomized optimal k-mer indexing approach for efficient parallel genome sequence compression

GSC: efficient lossless compression of VCF files with fast query.

Development of an accurate and rapid method for whole genome characterization of canine parvovirus

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Large-scale Genome Sequencing Research Articles

Related Topics

Articles published on Large-scale Genome Sequencing

Leveraging large-scale Mycobacterium tuberculosis whole genome sequence data to characterise drug-resistant mutations using machine learning and statistical approaches

Cardiovascular Disease Pathogenicity Predictor (CVD-PP): A Tissue-Specific In Silico Tool for Discriminating Pathogenicity of Variants of Unknown Significance in Cardiovascular Disease Genes.

Chinese Populations of Magnaporthe oryzae Serving as a Source of Human-Mediated Gene Flow to Asian Countries: A Population Genomic Analysis

Quest for Orthologs in the Era of Biodiversity Genomics.

Linkage Disequilibrium-Informed Deep Learning Framework to Identify Genetic Loci for Alzheimer's Disease Using Whole Genome Sequencing Data.

Large-scale genome sequencing of giant pandas improves the understanding of population structure and future conservation initiatives

How exome sequencing improves the diagnostics and management of men with non-syndromic infertility.

Chromatin remodellers as therapeutic targets.

Population structure and antibiotic resistance of swine extraintestinal pathogenic Escherichia coli from China

Biostatistical Aspects of Whole Genome Sequencing Studies: Preprocessing and Quality Control.

COPO - Managing sample metadata for biodiversity: considerations from the Darwin Tree of Life project

Real-time genomic surveillance for enhanced control of infectious diseases and antimicrobial resistance

Dynamic analysis of SARS-CoV-2 evolution based on different countries

Species-aware DNA language models capture regulatory elements and their evolution

Abstract 6937: High throughput single cell simultaneous DNA and RNA sequencing identified cell-of-origin of breast cancer

A catalogue of chromosome counts for Phylum Nematoda

Mobilisation and analyses of publicly available SARS-CoV-2 data for pandemic responses.

A randomized optimal k-mer indexing approach for efficient parallel genome sequence compression

GSC: efficient lossless compression of VCF files with fast query.

Development of an accurate and rapid method for whole genome characterization of canine parvovirus