Cross-Modal Denoising and Integration of Spatial Multi-Omics Data with CANDIES.
Spatial multi-omics data offer a powerful framework for integrating diverse molecular profiles while maintaining the spatial organization of cells. However, inherent variations in data quality and noise levels across different modalities pose significant challenges to accurate integration and analyses. In this paper, we introduce CANDIES, which leverages a conditional diffusion model and contrastive learning to effectively denoise and integrate spatial multi-omics data. With our innovative model and algorithm designs, CANDIES not only enhances the quality of spatial multi-omics data, but also yields a unified and comprehensive joint representation, thereby empowering many downstream analyses. We conduct extensive evaluations on diverse synthetic and real datasets, including MISAR-seq data from the mouse brain, spatial CITE-seq data from human skin biopsy tissue, spatial Mux-seq, and spatial ATAC-RNA-seq data from the mouse embryo, and 10 Visium data from human lymph nodes. CANDIES shows superior performance on various downstream tasks, including denoising, spatial domain identification, spatiotemporal trajectory reconstruction, and spatial association mapping for complex human traits. In particular, we show that CANDIES representations can be integrated with the rich resources from genome-wide association studies (GWASs), allowing the spatial domains to be linked with complex human traits, yielding spatially resolved interpretations of complex traits in their relevant tissues.
- Preprint Article
- 10.1101/2025.04.17.649333
- Apr 22, 2025
- bioRxiv (Cold Spring Harbor Laboratory)
Spatial multi-omics data offer a powerful framework for integrating diverse molecular profiles while maintaining the spatial organization of cells. However, inherent variations in data quality and noise levels across different modalities pose significant challenges to accurate integration and analyses. In this paper, we introduce CANDIES, which leverages a conditional diffusion model and contrastive learning to effectively denoise and integrates spatial multi-omics data. With our innovative model and algorithm designs, CANDIES not only enhances the quality of spatial multi-omics data, but also yields a unified and comprehensive joint representation, thereby empowering many downstream analysis. We conduct extensive evaluations on diverse synthetic and real datasets, including spatial CITE-seq data from human skin biopsy tissue, MISAR-seq data from the mouse brain, spatial ATAC-RNA-seq data from the mouse embryo and 10× visium data from human lymph nodes. CANDIES shows superior performance on various downstream tasks, including denoising, spatial domain identification, spatiotemporal trajectories reconstruction, and spatial association mapping for complex human traits. In particular, we show that CANDIES representations can be integrated with the rich resources from genome-wide association studies (GWASs), allowing the spatial domains to be linked with complex human traits, yielding spatially resolved interpretation of complex traits in their relevant tissues.
- Peer Review Report
- 10.7554/elife.82535.sa2
- Dec 13, 2022
Integrating chromatin accessibility and gene expression data into context-specific regulatory networks can provide better regulatory categories for heritability enrichment and relevant tissue identification.
- Supplementary Content
19
- 10.1093/bfgp/elu012
- Jun 10, 2014
- Briefings in Functional Genomics
Genome-wide association studies have been successful in identifying common variants that impact complex human traits and diseases. However, despite this success, the joint effects of these variants explain only a small proportion of the genetic variance in these phenotypes, leading to speculation that rare genetic variation might account for much of the ‘missing heritability’. Consequently, there has been an exciting period of research and development into the methodology for the analysis of rare genetic variants, typically by considering their joint effects on complex traits within the same functional unit or genomic region. In this review, we describe a general framework for modelling the joint effects of rare genetic variants on complex traits in association studies of unrelated individuals. We summarise a range of widely used association tests that have been developed from this model and provide an overview of the relative performance of these approaches from published simulation studies.
- Research Article
39
- 10.1016/j.ajhg.2021.02.006
- Feb 23, 2021
- The American Journal of Human Genetics
Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease
- Research Article
- 10.1093/bib/bbag218
- May 4, 2026
- Briefings in bioinformatics
Integrating single-cell and spatial multi-omics data is essential for resolving cellular identity, regulatory programs and tissue organization, yet remains challenging under realistic experimental designs. In practice, data are often incomplete, unpaired and partially overlapping across batches and platforms, resulting in mosaic settings where missingness, batch effects and limited correspondence are tightly coupled. Existing methods typically rely on shared cells, explicit anchors or post hoc mapping, and can become unstable as overlap diminishes or spatial structure is not explicitly modeled. Here we present GAMMI (Graph-guided Adversarial Mosaic Multi-omics Integration), a unified graph learning framework for mosaic integration of heterogeneous single-cell and spatial multi-omics data. Rather than performing direct cell or sample alignment, GAMMI learns biologically meaningful relational structure by jointly embedding cells and molecular features in a shared latent space using heterogeneous graphs that encode cell-feature, feature-feature and spatial adjacency relationships. An edge-based contrastive objective with missingness-aware negative sampling avoids false-negative supervision under unobserved modalities, while adversarial domain adaptation suppresses batch-associated variation at the embedding level. Spatial data are incorporated as structurally informative constraints during learning, enabling systematic enrichment of molecular representations across spatial locations. Across diverse mosaic single-cell benchmarks and spatial datasets, GAMMI consistently outperforms state-of-the-art methods in biological conservation, batch correction and spatial reconstruction, including in low-overlap and fully unpaired regimes.
- Conference Article
9
- 10.1109/hicss.1996.493251
- Jan 1, 1996
The proliferation of GIS technology has greatly increased the access to and the usage of spatial data. Making maps is relatively easy even for those who do not have much cartographic training. Nonetheless, the concerns for spatial data quality among GIS and spatial data users have just began to sprout partly because information about spatial data quality is not readily available or useful. Metadata, which refer to data describing data, include the quality and accuracy information of the data. The Federal Geographic Data Committee has proposed content standards of metadata for spatial databases. However, the standards are not adequate to document the spatial variation in data quality in geographic data. The paper argues that information about the quality of spatial data over a geographical area, which can be regarded as spatial metadata, should be derived and reported to help users of spatial data to make intelligent spatial decisions or policy formulations. While cartographers focus on the representation of spatial data quality, and statisticians emphasize the quantitative measures of data quality, this paper proposes that GIS are logical tools to assess certain types of error in spatial databases because GIS are widely used to gather, manipulate, analyze, and display spatial data. A framework is proposed to derive several types of data quality information using GIS. These types of quality information include positional accuracy, completeness, attribute accuracy, and to some extent logical consistency. However, not every type of spatial metadata can be derived from GIS.
- Research Article
2
- 10.4172/2155-6180.1000228
- Jan 1, 2015
- Journal of Biometrics & Biostatistics
Over one thousand genome-wide association studies (GWAS) have been conducted in the past decade.Increasing biological evidence suggests the polygenic genetic architecture of complex traits: a complex trait is affected by many risk variants with small or moderate effects jointly.Meanwhile, recent progress in GWAS suggests that complex human traits may share common genetic bases, which is known as "pleiotropy".To further improve statistical power of detecting risk genetic variants in GWAS, we propose a penalized regression method to analyze the GWAS dataset of primary interest by incorporating information from other related GWAS.The proposed method does not require the individual-level of genotype and phenotype data from other related GWAS, making it useful when only summary statistics are available.method.Simulation studies showed that the proposed approach had satisfactory performance.We applied the proposed method to analyze a body mass index (BMI) GWAS dataset from a European American (EA) population and achieved improvement over single GWAS analysis.
- Book Chapter
36
- 10.1016/b978-0-12-380862-2.00004-7
- Jan 1, 2010
- Advances in Genetics
4 - Multigenic Modeling of Complex Disease by Random Forests
- Research Article
40
- 10.1016/j.ajhg.2012.05.020
- Jul 1, 2012
- American journal of human genetics
Many genetic loci and SNPs associated with many common complex human diseases and traits are now identified. The total genetic variance explained by these loci for a trait or disease, however, has often been very small. Much of the "missing heritability" has been revealed to be hidden in the genome among the large number of variants with small effects. Several recent studies have reported the presence of multiple independent SNPs and genetic heterogeneity in trait-associated loci. It is therefore reasonable to speculate that such a phenomenon could be common among loci known to be associated with a complex trait or disease. For testing this hypothesis, a total of 117 loci known to be associated with rheumatoid arthritis (RA), Crohn disease (CD), type 1 diabetes (T1D), or type 2 diabetes (T2D) were selected. The presence of multiple independent effects was assessed in the case-control samples genotyped by the Wellcome Trust Case Control Consortium study and imputed with SNP genotype information from the HapMap Project and the 1000 Genomes Project. Eleven loci with evidence of multiple independent effects were identified in the study, and the number was expected to increase at larger sample sizes and improved statistical power. The variance explained by the multiple effects in a locus was much higher than the variance explained by the single reported SNP effect. The results thus significantly improve our understanding of the allelic structure of these individual disease-associated loci, as well as our knowledge of the general genetic mechanisms of common complex traits and diseases.
- Research Article
25
- 10.1186/s12864-015-1513-5
- Apr 16, 2015
- BMC Genomics
BackgroundThe genotype information carried by Genome-wide association studies (GWAS) seems to have the potential to explain more of the ‘missing heritability’ of complex human phenotypes, given improved statistical approaches. Several lines of evidence support the involvement of microRNA (miRNA) and other non-coding RNA in complex human traits and diseases.We employed a novel, genetic annotation-informed enrichment method for GWAS that captures more polygenic effects than standard GWAS analysis, to investigate if miRNA-tagging Single Nucleotide Polymorphisms (SNPs) are enriched of associations with 15 complex human phenotypes. We then leveraged the enrichment using a conditional False Discovery Rate (condFDR) approach to assess any improvement in the detection of individual miRNA SNPs associated with the disorders.ResultsWe found SNPs tagging miRNA transcription regions to be significantly enriched of associations with 10 of 15 phenotypes. The enrichment remained significant after controlling for affiliation to other genomic categories, and was confirmed by replication. Albeit only nominally significant, enrichment was found also in miRNA binding sites for 10 phenotypes out of 15. Leveraging the enrichment in the condFDR framework, we observed a 2-4-fold increase in discovery of SNPs tagging miRNA regions.ConclusionsOur results suggest that miRNAs play an important role in the polygenic architecture of complex human disorders and traits, and therefore that miRNAs are a genomic category that can and should be used to improve gene discovery.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1513-5) contains supplementary material, which is available to authorized users.
- Research Article
10
- 10.1371/journal.pgen.1010494
- Nov 7, 2022
- PLOS Genetics
Natural selection shapes the genetic architecture of many human traits. However, the prevalence of different modes of selection on genomic regions associated with variation in traits remains poorly understood. To address this, we developed an efficient computational framework to calculate positive and negative enrichment of different evolutionary measures among regions associated with complex traits. We applied the framework to summary statistics from >900 genome-wide association studies (GWASs) and 11 evolutionary measures of sequence constraint, population differentiation, and allele age while accounting for linkage disequilibrium, allele frequency, and other potential confounders. We demonstrate that this framework yields consistent results across GWASs with variable sample sizes, numbers of trait-associated SNPs, and analytical approaches. The resulting evolutionary atlas maps diverse signatures of selection on genomic regions associated with complex human traits on an unprecedented scale. We detected positive enrichment for sequence conservation among trait-associated regions for the majority of traits (>77% of 290 high power GWASs), which included reproductive traits. Many traits also exhibited substantial positive enrichment for population differentiation, especially among hair, skin, and pigmentation traits. In contrast, we detected widespread negative enrichment for signatures of balancing selection (51% of GWASs) and absence of enrichment for evolutionary signals in regions associated with late-onset Alzheimer's disease. These results support a pervasive role for negative selection on regions of the human genome that contribute to variation in complex traits, but also demonstrate that diverse modes of evolution are likely to have shaped trait-associated loci. This atlas of evolutionary signatures across the diversity of available GWASs will enable exploration of the relationship between the genetic architecture and evolutionary processes in the human genome.
- Research Article
45
- 10.1136/amiajnl-2012-001519
- Jul 1, 2013
- Journal of the American Medical Informatics Association
BackgroundWhile genome-wide association studies (GWAS) of complex traits have revealed thousands of reproducible genetic associations to date, these loci collectively confer very little of the heritability of their respective diseases and, in general, have contributed little to our understanding the underlying disease biology. Physical protein interactions have been utilized to increase our understanding of human Mendelian disease loci but have yet to be fully exploited for complex traits.MethodsWe hypothesized that protein interaction modeling of GWAS findings could highlight important disease-associated loci and unveil the role of their network topology in the genetic architecture of diseases with complex inheritance.ResultsNetwork modeling of proteins associated with the intragenic single nucleotide polymorphisms of the National Human Genome Research Institute catalog of complex trait GWAS revealed that complex trait associated loci are more likely to be hub and bottleneck genes in available, albeit incomplete, networks (OR=1.59, Fisher's exact test p<2.24×10−12). Network modeling also prioritized novel type 2 diabetes (T2D) genetic variations from the Finland–USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics and the Wellcome Trust GWAS data, and demonstrated the enrichment of hubs and bottlenecks in prioritized T2D GWAS genes. The potential biological relevance of the T2D hub and bottleneck genes was revealed by their increased number of first degree protein interactions with known T2D genes according to several independent sources (p<0.01, probability of being first interactors of known T2D genes).ConclusionVirtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.
- Research Article
1
- 10.1371/journal.pgen.1010494.r004
- Nov 7, 2022
- PLOS Genetics
Natural selection shapes the genetic architecture of many human traits. However, the prevalence of different modes of selection on genomic regions associated with variation in traits remains poorly understood. To address this, we developed an efficient computational framework to calculate positive and negative enrichment of different evolutionary measures among regions associated with complex traits. We applied the framework to summary statistics from >900 genome-wide association studies (GWASs) and 11 evolutionary measures of sequence constraint, population differentiation, and allele age while accounting for linkage disequilibrium, allele frequency, and other potential confounders. We demonstrate that this framework yields consistent results across GWASs with variable sample sizes, numbers of trait-associated SNPs, and analytical approaches. The resulting evolutionary atlas maps diverse signatures of selection on genomic regions associated with complex human traits on an unprecedented scale. We detected positive enrichment for sequence conservation among trait-associated regions for the majority of traits (>77% of 290 high power GWASs), which included reproductive traits. Many traits also exhibited substantial positive enrichment for population differentiation, especially among hair, skin, and pigmentation traits. In contrast, we detected widespread negative enrichment for signatures of balancing selection (51% of GWASs) and absence of enrichment for evolutionary signals in regions associated with late-onset Alzheimer’s disease. These results support a pervasive role for negative selection on regions of the human genome that contribute to variation in complex traits, but also demonstrate that diverse modes of evolution are likely to have shaped trait-associated loci. This atlas of evolutionary signatures across the diversity of available GWASs will enable exploration of the relationship between the genetic architecture and evolutionary processes in the human genome.
- Research Article
3247
- 10.1038/ng.3538
- Mar 28, 2016
- Nature Genetics
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human complex traits. However, the genes or functional DNA elements through which these variants exert their effects on the traits are often unknown. We propose a method (called SMR) that integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy. We apply the method to five human complex traits using GWAS data on up to 339,224 individuals and eQTL data on 5,311 individuals, and we prioritize 126 genes (for example, TRAF1 and ANKRD55 for rheumatoid arthritis and SNX19 and NMRAL1 for schizophrenia), of which 25 genes are new candidates; 77 genes are not the nearest annotated gene to the top associated GWAS SNP. These genes provide important leads to design future functional studies to understand the mechanism whereby DNA variation leads to complex trait variation.
- Research Article
30
- 10.1002/gepi.21675
- Sep 5, 2012
- Genetic epidemiology
Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.