Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

LatinMark: Robust watermarking for latent diffusion models via distribution-preserving rearrangement based on latin hypercube sampling

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

LatinMark: Robust watermarking for latent diffusion models via distribution-preserving rearrangement based on latin hypercube sampling

Similar Papers
  • Book Chapter
  • Cite Count Icon 2
  • 10.1016/b978-0-443-13470-8.00008-3
16 - Probabilistic modeling of chloride diffusion in repaired reinforced concrete structures
  • Jan 1, 2024
  • Eco-efficient Repair and Rehabilitation of Concrete Infrastructures
  • Quynh Chau Truong + 2 more

16 - Probabilistic modeling of chloride diffusion in repaired reinforced concrete structures

  • Research Article
  • Cite Count Icon 18
  • 10.1371/journal.pcbi.1010255
Identifying control ensembles for information processing within the cortico-basal ganglia-thalamic circuit.
  • Jun 23, 2022
  • PLOS Computational Biology
  • Catalina Vich + 3 more

In situations featuring uncertainty about action-reward contingencies, mammals can flexibly adopt strategies for decision-making that are tuned in response to environmental changes. Although the cortico-basal ganglia thalamic (CBGT) network has been identified as contributing to the decision-making process, it features a complex synaptic architecture, comprised of multiple feed-forward, reciprocal, and feedback pathways, that complicate efforts to elucidate the roles of specific CBGT populations in the process by which evidence is accumulated and influences behavior. In this paper we apply a strategic sampling approach, based on Latin hypercube sampling, to explore how variations in CBGT network properties, including subpopulation firing rates and synaptic weights, map to variability of parameters in a normative drift diffusion model (DDM), representing algorithmic aspects of information processing during decision-making. Through the application of canonical correlation analysis, we find that this relationship can be characterized in terms of three low-dimensional control ensembles within the CBGT network that impact specific qualities of the emergent decision policy: responsiveness (a measure of how quickly evidence evaluation gets underway, associated with overall activity in corticothalamic and direct pathways), pliancy (a measure of the standard of evidence needed to commit to a decision, associated largely with overall activity in components of the indirect pathway of the basal ganglia), and choice (a measure of commitment toward one available option, associated with differences in direct and indirect pathways across action channels). These analyses provide mechanistic predictions about the roles of specific CBGT network elements in tuning the way that information is accumulated and translated into decision-related behavior.

  • Research Article
  • Cite Count Icon 12
  • 10.1371/journal.pcbi.1010255.r004
Identifying control ensembles for information processing within the cortico-basal ganglia-thalamic circuit
  • Jun 23, 2022
  • PLoS Computational Biology
  • Catalina Vich + 5 more

In situations featuring uncertainty about action-reward contingencies, mammals can flexibly adopt strategies for decision-making that are tuned in response to environmental changes. Although the cortico-basal ganglia thalamic (CBGT) network has been identified as contributing to the decision-making process, it features a complex synaptic architecture, comprised of multiple feed-forward, reciprocal, and feedback pathways, that complicate efforts to elucidate the roles of specific CBGT populations in the process by which evidence is accumulated and influences behavior. In this paper we apply a strategic sampling approach, based on Latin hypercube sampling, to explore how variations in CBGT network properties, including subpopulation firing rates and synaptic weights, map to variability of parameters in a normative drift diffusion model (DDM), representing algorithmic aspects of information processing during decision-making. Through the application of canonical correlation analysis, we find that this relationship can be characterized in terms of three low-dimensional control ensembles within the CBGT network that impact specific qualities of the emergent decision policy: responsiveness (a measure of how quickly evidence evaluation gets underway, associated with overall activity in corticothalamic and direct pathways), pliancy (a measure of the standard of evidence needed to commit to a decision, associated largely with overall activity in components of the indirect pathway of the basal ganglia), and choice (a measure of commitment toward one available option, associated with differences in direct and indirect pathways across action channels). These analyses provide mechanistic predictions about the roles of specific CBGT network elements in tuning the way that information is accumulated and translated into decision-related behavior.

  • Conference Article
  • Cite Count Icon 2
  • 10.1145/3664647.3681220
DERO: Diffusion-Model-Erasure Robust Watermarking
  • Oct 28, 2024
  • Han Fang + 5 more

The effective denoising demonstrated by the latent diffusion model poses a new threat to image watermarking, as attackers can erase the watermark by performing a forward diffusion, followed by backward denoising. While such denoising might introduce large distortion in the pixel domain, the image semantics remain similar. Unfortunately, most existing robust watermarking methods fail to tackle such an erasure attack since they are primarily designed for traditional channel distortions. To address such issue, this paper proposed DERO, a diffusion-model-erasure robust watermarking framework. Based on the frequency domain analysis of the diffusion model's denoising process, we designed a destruction and compensation noise layer (DCNL) to approximate the distortion effects caused by latent diffusion model erasure (LDE). In detail, DCNL consists of a multi-scale low-pass filtering and a white noise compensation process, where the high-frequency components of the image are first obliterated, and then full-frequency components are enriched with white noise. Such a process broadly simulates the LDE distortions. Besides, on the extraction side, we cascaded a pre-trained variational autoencoder before the decoder to extract the watermark in the latent domain, which closely adapts to the operation domain of the LDE process. Meanwhile, to improve the robustness of the decoder, we also design a latent feature augmentation (LFA) operation on the latent feature. Throughout the end-to-end training with the DCNL and LFA, DERO can successfully achieve robustness against LDE. Our experimental results demonstrate the effectiveness and the generalizability of the proposed framework. The LDE robustness is significantly improved from 75% with SOTA methods to an impressive 96% with DERO.

  • Research Article
  • Cite Count Icon 3
  • 10.3390/electronics14010025
Latent Diffusion Models for Image Watermarking: A Review of Recent Trends and Future Directions
  • Dec 25, 2024
  • Electronics
  • Hongjun Hur + 3 more

Recent advancements in deep learning-based generative models have simplified image generation, increasing the need for improved source tracing and copyright protection, especially with the efficient, high-quality output of latent diffusion models (LDMs) raising concerns about unauthorized use. This paper provides a comprehensive review of watermarking techniques applied to latent diffusion models, focusing on recent trends and the potential utility of these approaches. Watermarking using latent diffusion models offers the potential to overcome these limitations by embedding watermarks in the latent space during the image generation process. This represents a new paradigm of watermarking that leverages a degree of freedom unavailable in traditional watermarking techniques and underscores the need to explore the potential advancements in watermark technology. LDM-based watermarking allows for the natural internalization of watermarks within the content generation process, enabling robust watermarking without compromising image quality. We categorize the methods based on embedding strategies and analyze their effectiveness in achieving key functionalities—source tracing, copyright protection, and AI-generated content identification. The review highlights the strengths and limitations of current techniques and discusses future directions for enhancing the robustness and applicability of watermarking in the evolving landscape of generative AI.

  • Research Article
  • 10.32347/2077-3455.2025.71.494-509
Application of Neural Networks in Building Architecture and Optimization of Latent Diffusion Models for This Purpose
  • Mar 28, 2025
  • Current problems of architecture and urban planning
  • Galyna Getun + 4 more

This article explores the application of an innovative neural network-based approach, namely latent diffusion models, in the field of architectural design and visualization. Traditional methods of creating architectural visualizations are often labor-intensive, while AI-powered automated assistants enable the optimization of their creation process, focusing human attention on creativity and innovation. In this regard, the use of generative neural networks, particularly latent diffusion models, opens new perspectives for the rapid and efficient creation of diverse architectural visualizations. Latent diffusion models allow achieving high-quality generation of images with complex structures, which is crucial for architectural visualizations characterized by a large number of details and variations. The article describes the operating mechanism of latent diffusion models, as well as the specifics of their application for generating architectural objects, especially for prototyping purposes. Furthermore, the article considers the Low-Rank Adaptation (LoRA) method for fine-tuning pre-trained latent diffusion models. The LoRA method allows efficiently adapting large models to specific tasks with minimal computational costs. In the context of architectural design, this means the possibility of quickly adjusting a general model to generate buildings of a specific style or type, such as modern skyscrapers or historical buildings. The use of the Low-Rank Adaptation method significantly expands the possibilities of rapid autonomous creation of architectural visualizations, allowing architects and designers to quickly generate and explore various project options. The article includes examples of the successful application of latent diffusion models using the LoRA method for generating architectural visualizations, demonstrating the high quality and diversity of the results obtained. The research results demonstrate the significant potential of using latent diffusion models for application in the field of architectural design, providing a new level of automation and creativity.

  • Research Article
  • Cite Count Icon 12
  • 10.1109/lsp.2024.3453120
DLDiff: Image Detail-Guided Latent Diffusion Model for Low-Light Image Enhancement
  • Jan 1, 2024
  • IEEE Signal Processing Letters
  • Minglong Xue + 3 more

Low-light image enhancement is an essential task in image restoration. Inspired by the diffusion model, the related methods have achieved remarkable results in low-level visual tasks. However, such methods are susceptible to large-scale images, generating problems such as overconsumption of resources and low recovery efficiency. To address this, we propose a detail-guided latent space low-light image enhancement diffusion model called DLDiff. Leveraging the generative power of the latent diffusion model, we explore ways to speed up inference better while producing excellent perceptual fidelity. Specifically, we initially employ the latent diffusion model to transform low-light image features into a latent space representation, thereby reducing computational resource consumption. Next, we design a lightweight detail prompt module that combines cross-convolution and vast-receptive-field convolution blocks. This module enhances the fine-grained details of the image, effectively supplements multiscale feature information, and minimizes feature loss in the latent space. Furthermore, we devise the content-aware loss group to facilitate learning noise and image information, enhancing the model's recovery capability, guiding stable sampling, and constraining diverse content generation. Through extensive experiments, we demonstrate the model's significant efficiency and quality advantages in low-light image enhancement tasks.

  • Research Article
  • 10.22214/ijraset.2024.65751
Advancing Image Synthesis: Methods and Applications of Latent Diffusion Models
  • Dec 31, 2024
  • International Journal for Research in Applied Science and Engineering Technology
  • Soumika Chakraborty

Diffusion models (DMs) have revolutionized the field of generative modelling, delivering exceptional results in tasks such as image synthesis, inpainting, and super-resolution. Despite their success, the reliance on pixel-space processing in these models has imposed substantial computational challenges, requiring hundreds of GPU days for training and significant resources for inference. In this work, we introduce Latent Diffusion Models (LDMs), an innovative framework that addresses these limitations by operating within a perceptually compressed latent space derived from a pretrained autoencoder. This paradigm shift significantly reduces the computational complexity of both training and inference while maintaining the high fidelity and diversity of the generated outputs. LDMs leverage a two-stage approach: a pretrained autoencoder for efficient latent-space representation and a diffusion model trained directly within this space. By introducing cross-attention layers into the architecture, LDMs also support flexible conditioning on various modalities such as text descriptions, semantic maps, or lowresolution images. This versatility enables the model to perform a range of tasks, including text-to-image generation, classconditional image synthesis, and high-resolution super-resolution. Our experiments demonstrate that LDMs achieve competitive or state-of-the-art performance across multiple benchmarks, including CelebA-HQ, ImageNet, and MS-COCO, while requiring significantly fewer computational resources than pixel-space diffusion models. For instance, LDMs reduce training time by up to 2.7× and inference memory requirements by 50%, all while improving sample quality. This work highlights the potential of latent-space generative models to democratize access to advanced generative AI technologies, making them feasible for researchers and practitioners with limited computational resources. At the same time, we discuss the ethical considerations of using such models, including their potential misuse for creating manipulated content. Latent Diffusion Models pave the way for efficient, scalable, and high-quality image synthesis, providing a robust foundation for future advancements in generative modelling.

  • Research Article
  • Cite Count Icon 6
  • 10.1109/tip.2025.3580269
Contrastive Conditional Latent Diffusion for Audio-Visual Segmentation.
  • Jan 1, 2025
  • IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
  • Yuxin Mao + 6 more

Audio-visual Segmentation (AVS) is conceptualized as a conditional generation task, where audio is considered as the conditional variable for segmenting the sound producer(s). In this case, audio should be extensively explored to maximize its contribution for the final segmentation task. We propose a contrastive conditional latent diffusion model for audio-visual segmentation (AVS) to thoroughly investigate the impact of audio, where the correlation between audio and the final segmentation map is modeled to guarantee the strong correlation between them. To achieve semantic-correlated representation learning, our framework incorporates a latent diffusion model. The diffusion model learns the conditional generation process of the ground-truth segmentation map, resulting in ground-truth aware inference during the denoising process at the test stage. As our model is conditional, it is vital to ensure that the conditional variable contributes to the model output. We thus extensively model the contribution of the audio signal by minimizing the density ratio between the conditional probability of the multimodal data, e.g. conditioned on the audio-visual data, and that of the unimodal data, e.g. conditioned on the audio data only. In this way, our latent diffusion model via density ratio optimization explicitly maximizes the contribution of audio for AVS, which can then be achieved with contrastive learning as a constraint, where the diffusion part serves as the main objective to achieve maximum likelihood estimation, and the density ratio optimization part imposes the constraint. By adopting this latent diffusion model via contrastive learning, we effectively enhance the contribution of audio for AVS. The effectiveness of our solution is validated through experimental results on the benchmark dataset. Code and results are online via our project page: https://github.com/OpenNLPLab/DiffusionAVS.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3390/app131810379
Novel Paintings from the Latent Diffusion Model through Transfer Learning
  • Sep 16, 2023
  • Applied Sciences
  • Dayin Wang + 2 more

With the development of deep learning, image synthesis has achieved unprecedented achievements in the past few years. Image synthesis models, represented by diffusion models, demonstrated stable and high-fidelity image generation. However, the traditional diffusion model computes in pixel space, which is memory-heavy and computing-heavy. Therefore, to ease the expensive computing and improve the accessibility of diffusion models, we train the diffusion model in latent space. In this paper, we are devoted to creating novel paintings from existing paintings based on powerful diffusion models. Because the cross-attention layer is adopted in the latent diffusion model, we can create novel paintings with conditional text prompts. However, direct training of the diffusion model on the limited dataset is non-trivial. Therefore, inspired by the transfer learning, we train the diffusion model with the pre-trained weights, which eases the training process and enhances the image synthesis results. Additionally, we introduce the GPT-2 model to expand text prompts for detailed image generation. To validate the performance of our model, we train the model on paintings of the specific artist from the dataset WikiArt. To make up for missing image context descriptions of the WikiArt dataset, we adopt a pre-trained language model to generate corresponding context descriptions automatically and clean wrong descriptions manually, and we will make it available to the public. Experimental results demonstrate the capacity and effectiveness of the model.

  • Research Article
  • 10.3390/bioengineering12070764
Implementation of a Conditional Latent Diffusion-Based Generative Model to Synthetically Create Unlabeled Histopathological Images.
  • Jul 15, 2025
  • Bioengineering (Basel, Switzerland)
  • Mahfujul Islam Rumman + 6 more

Generative image models have revolutionized artificial intelligence by enabling the synthesis of high-quality, realistic images. These models utilize deep learning techniques to learn complex data distributions and generate novel images that closely resemble the training dataset. Recent advancements, particularly in diffusion models, have led to remarkable improvements in image fidelity, diversity, and controllability. In this work, we investigate the application of a conditional latent diffusion model in the healthcare domain. Specifically, we trained a latent diffusion model using unlabeled histopathology images. Initially, these images were embedded into a lower-dimensional latent space using a Vector Quantized Generative Adversarial Network (VQ-GAN). Subsequently, a diffusion process was applied within this latent space, and clustering was performed on the resulting latent features. The clustering results were then used as a conditioning mechanism for the diffusion model, enabling conditional image generation. Finally, we determined the optimal number of clusters using cluster validation metrics and assessed the quality of the synthetic images through quantitative methods. To enhance the interpretability of the synthetic image generation process, expert input was incorporated into the cluster assignments.

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.neunet.2024.106762
HiddenSinger: High-quality singing voice synthesis via neural audio codec and latent diffusion models
  • Sep 27, 2024
  • Neural Networks
  • Ji-Sang Hwang + 2 more

HiddenSinger: High-quality singing voice synthesis via neural audio codec and latent diffusion models

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/3627673.3679547
Hierarchical Graph Latent Diffusion Model for Conditional Molecule Generation
  • Oct 21, 2024
  • Tian Bian + 8 more

Recently, generative models based on the diffusion process have emerged as a promising direction for automating the design of molecules. However, directly adding continuous Gaussian noise to discrete graphs leads to the problem that the generated data do not conform to the discrete graph data distribution in the training set. Current graph diffusion models either corrupt discrete data through a transition matrix or relax the discrete data to continuous space for the diffusion process. These approaches make it hard to perform extensible conditional generation, such as adapting to text-based conditions, due to the lack of embedding representations and require significant computation resources due to the diffusion process of the bond type matrix. This paper introduces the Hierarchical Graph Latent Diffusion Model (HGLDM), a novel variant of latent diffusion models that overcomes the problem of applying continuous diffusion models directly to discrete graph data. Meanwhile, based on the latent diffusion framework, HGLDM avoids the issues of computational consumption and lack of embeddings for extensible conditional generation. In addition, by comparing the HGLDM with its variant, the Graph Latent Diffusion Model (GLDM), which only has graph-level embeddings, we validate the advantage of the hierarchical graph structure for capturing the relationship between structure information and molecular properties. We evaluate the performance of our model through various conditional generation tasks, demonstrating its superior performance.

  • Research Article
  • Cite Count Icon 1
  • 10.1093/bioinformatics/btaf426
Geometry-complete latent diffusion model for 3D molecule generation
  • Jul 30, 2025
  • Bioinformatics
  • Qunhao Zhang + 5 more

MotivationGenerative models, especially diffusion models, have recently made remarkable progress in fields such as graph generation and drug design. However, current diffusion-based 3D molecule generation models still struggle with adequately modeling the true data distribution.ResultsWe designed the geometry-complete latent diffusion model (GCLDM) to enhance the modeling capacity of diffusion models. A geometry-complete autoencoder for feature mapping between atom space and latent space is introduced. In addition, the latent space diffusion model can model continuous latent representations, which is helpful in learning to fit multi-modal feature distributions for the diffusion model. The comparative experimental results demonstrate that GCLDM could fit the true distribution of molecules well and outperform other state-of-art methods.Availability and implementationOur codes and data are all provided at: [https://github.com/charlotte0104/GCLDM-for-3d-molucule-generation], and [https://zenodo.org/records/15773195].

  • Research Article
  • Cite Count Icon 47
  • 10.1016/j.patcog.2024.111198
Frequency domain-based latent diffusion model for underwater image enhancement
  • Nov 22, 2024
  • Pattern Recognition
  • Jingyu Song + 6 more

Frequency domain-based latent diffusion model for underwater image enhancement

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant