Articles published on Generation Models
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
14009 Search results
Sort by Recency
- New
- Research Article
1
- 10.1016/j.neunet.2025.108492
- May 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Dong Liu + 3 more
SDEIT: Semantic-Driven Electrical Impedance Tomography.
- New
- Research Article
- 10.1016/j.bspc.2026.109584
- May 1, 2026
- Biomedical Signal Processing and Control
- Yu Li + 3 more
Multimodal Brownian bridge diffusion model for controllable synthetic medical image generation
- New
- Research Article
- 10.1016/j.jbi.2026.105011
- May 1, 2026
- Journal of biomedical informatics
- Ziyang Li + 3 more
Knowledge-Guided and Reinforced Selective State Space Model for radiology report generation.
- New
- Addendum
- 10.1016/j.cageo.2026.106138
- May 1, 2026
- Computers & Geosciences
- Roberto Miele + 1 more
Corrigendum to “Diffusion models for multivariate subsurface generation and efficient probabilistic inversion” [Computers & Geosciences 207 (2026) 106076
- New
- Research Article
- 10.1016/j.inffus.2025.103912
- May 1, 2026
- Information Fusion
- Ting Yun + 5 more
LGINet: Linguistic guided image diffusion model for tree species generation and identification from aerial imagery
- New
- Research Article
- 10.1016/j.isprsjprs.2026.02.037
- May 1, 2026
- ISPRS Journal of Photogrammetry and Remote Sensing
- Min Zhao + 3 more
SuperSTF: A latent diffusion model for cloud-free spatiotemporal remote sensing image fusion
- New
- Research Article
- 10.1016/j.inffus.2025.104003
- May 1, 2026
- Information Fusion
- Angelo Moroncelli + 5 more
• A dual taxonomy is introduced linking generative AI tools with reinforcement learning. • First review to analyze RL training and fine-tuning of generative policies for robotics control tasks. • Covers 245 papers integrating Transformer and Diffusion-based architectures into RL pipelines. • Highlights key roles of LLMs, VLMs, diffusion models, world and video prediction models in robotics policy learning. • Identifies open challenges in grounding, scalability, and safety of robotics generative policies. Recently, generative AI and reinforcement learning (RL) have been redefining what is possible for AI agents that take information flows as input and produce intelligent behavior. As a result, we are seeing similar advancements in embodied AI and robotics for control policy generation. Our review paper examines the integration of generative AI models with RL to advance robotics. Our primary focus is on the duality between generative AI and RL for robotics downstream tasks. Specifically, we investigate: (1) The role of prominent generative AI tools as modular priors for multi-modal input fusion in RL tasks. (2) How RL can train, fine-tune and distill generative models for policy generation, such as VLA models, similarly to RL applications in large language models. We then propose a new taxonomy based on a considerable amount of selected papers. Lastly, we identify open challenges accounting for model scalability, adaptation and grounding, giving recommendations and insights on future research directions. We reflect on which generative AI models best fit the RL tasks and why. On the other side, we reflect on important issues inherent to RL-enhanced generative policies, such as safety concerns and failure modes, and what are the limitations of current methods. A curated collection of relevant research papers is maintained on our GitHub repository , serving as a resource for ongoing research and development in this field.
- New
- Research Article
- 10.1016/j.iswa.2026.200659
- May 1, 2026
- Intelligent Systems with Applications
- Atiquer Rahman Sarkar + 4 more
Assessment of differentially private fine-tuning of large language models for synthetic clinical note generation
- New
- Research Article
- 10.1016/j.jfca.2026.109068
- May 1, 2026
- Journal of Food Composition and Analysis
- Anum Mehmood + 9 more
Coconut shows unique physiological characteristics and application value at different developmental stages, and accurate identification and classification of its internal developmental stages are important for research and planting management. With the rapid development of CT nondestructive testing and artificial intelligence technology, developmental monitoring based on the internal morphological characteristics of coconuts has become possible. However, limited by the limited number of coconut CT samples, the recognition accuracy of existing classification models for internal developmental stages is significantly restricted. In this study, a method for classifying multiple developmental stages of coconuts based on few-shot image generation is proposed. Firstly, an improved few-shot image generation model FastGAN-Pro is proposed, which is capable of generating higher-quality CT images of coconuts at different developmental stages with a small amount of training data. On this basis, a multi-developmental stage classification method based on migration learning is proposed based on two pre-trained models, VGG16 and ResNet18. The experimental results show that compared with the baseline model, the FID values of FastGAN-Pro are reduced by about 21.08% on average, and the generated images are more visually. • FastGAN-Pro boosts coconut CT image realism and quality in few-shot scenarios. • Classification accuracy improves by up to 14.16% with best-sized augmentation. • Method enables reliable multi-stage coconut development monitoring via CT.
- New
- Research Article
- 10.1016/j.eswa.2026.131230
- May 1, 2026
- Expert Systems with Applications
- Xiaoyi Liu + 4 more
Refprogen: a reference-guided molecular generation model with protein-ligand joint representation for property-aware drug design
- New
- Research Article
- 10.1016/j.trc.2026.105619
- May 1, 2026
- Transportation Research Part C: Emerging Technologies
- Azam Ali + 4 more
Investigating the impact of inferring trip purposes in a daily trip generation model
- New
- Research Article
- 10.1021/acs.jcim.6c00181
- Apr 25, 2026
- Journal of chemical information and modeling
- Shogo Nakamura + 2 more
In drug discovery tasks, achieving a balance between high biological activity toward therapeutic targets and synthetic chemical feasibility is critically important. While the recently proposed deep learning-based molecular generation models have enabled explorations of vast chemical spaces, most existing approaches do not consider synthetic routes for generated compounds. To address this issue, TRACE-GFN is proposed for molecular optimization; this method incorporates chemical reaction pathways into a quantitative structure-activity relationship (QSAR)-guided molecular design procedure. The method integrates a transformer model to explicitly learn chemical reactions with a generative flow network (GFlowNet) that efficiently samples diverse candidates. In benchmark experiments involving dopamine receptor D2 (DRD2), AKT serine/threonine kinase 1 (AKT1), and C-X-C motif chemokine receptor 4 (CXCR4), TRACE-GFN demonstrated the ability to identify compounds with high QSAR values while maintaining strong diversity, outperforming the existing molecular generation models. These results demonstrate that the proposed model can efficiently explore promising compounds while accounting for real-world chemical reactions. The source code is publicly available under an MIT license at https://github.com/sekijima-lab/TRACE-GFN.
- New
- Research Article
- 10.1038/s41598-026-50239-1
- Apr 24, 2026
- Scientific reports
- Ömer Uranbey + 2 more
Comparative evaluation of large language models for guideline-compliant abstract generation and readability in dental research: an experimental comparative study.
- New
- Research Article
- 10.1038/s41556-026-01945-5
- Apr 23, 2026
- Nature cell biology
- Robert G Parton + 2 more
Caveolae have long been considered to be an alternative endocytic pathway, with distinct cargoes, but generally similar functions, to clathrin-coated pits. Here we suggest that the mechanisms of caveola formation and their scission are tightly interlinked and rely on specific lipids. These mechanisms are fundamentally different to those driving the formation and fission of coated pits. Both formation and scission of caveolae are driven by lipid-induced shaping of the caveolar domain, and we present biophysical models for lipid-driven curvature generation and its coupling with scission. In addition, we propose that these new insights have important implications for understanding the function of endocytosis mediated by caveolae. Rather than a parallel endocytic pathway for protein cargo, we argue that caveolae are a lipid-sensitive mobilized multifunctional surface domain.
- New
- Research Article
- 10.1007/s11831-026-10599-3
- Apr 23, 2026
- Archives of Computational Methods in Engineering
- Saumya Anand + 2 more
Machine Learning in Household Trip Generation Modelling: A Comprehensive Review
- New
- Research Article
- 10.3390/s26092584
- Apr 22, 2026
- Sensors
- Jie Liu + 3 more
As a vital part of urban public transportation system, subway passenger flow prediction plays a crucial role in alleviating traffic congestion, improving transportation infrastructure, and optimizing travel experience. Existing subway passenger flow prediction mainly focuses on short-term predictions of inbound/outbound passenger flow and origin-destination (O-D) demand. Subway section passenger flow prediction can provide a more direct reflection of passenger fluctuations across different line segments, and offer robust support for management and resource allocation. We propose a subway section passenger flow generation model and a prediction method based on LTiT (LSTM-TSSA-iTransformer). This model is based on the overall architecture of the iTransformer encoder, and an LSTM (Long Short-Term Memory) network is employed to capture the temporal characteristics of subway section passenger flow. This is combined with the TSSA (Token Statistics Self-Attention) to adaptively weight the information at key time points. Efficient performance of the model was evaluated by comparing its predictions with other models, including SARIMA (Seasonal Auto-Regressive integrated moving average), BP neural networks, LightGBM (Light Gradient Boosting Machine) and LSTM (Long Short-Term Memory). Experimental results show that the proposed model outperforms traditional baseline models in evaluation metrics such as R2, MAE, MSE, and MAPE. Finally, we further investigate the selection of input window length and prediction step size, and perform robustness analysis under different noise conditions.
- New
- Research Article
- 10.3390/ai7040149
- Apr 21, 2026
- AI
- Li Ding + 5 more
With the scaling down of integrated circuit dimensions and the increasing complexity of transistor structures, the role of etching in manufacturing has become increasingly critical. We propose an etching simulation approach based on a video generation model, which models the evolution of the etching process as a video generation task. By embedding frames into quantized latent codeword representations using VQ-VAE (Vector Quantized Variational Autoencoder), injecting physical conditions with a CLIP projection layer, and leveraging a temporal autoregressive prediction model, we propose a generative model of the etching process. We validate the effectiveness of our model on both simulated and experimental data. Our approach achieves a 6000× speedup over the Monte Carlo method while reducing the simulation MAE (Mean Absolute Error) by 14.4% compared with the state-of-the-art video model. Furthermore, results generated by our video-based model show strong agreement with experimental data.
- New
- Research Article
- 10.3390/electronics15081755
- Apr 21, 2026
- Electronics
- Tao Zhou + 5 more
The diffusion model (DM) is a hot topic in deep generative models and is widely applied in image generation. In diffusion models, there are four main “secrets” that affect high-quality image generation: constructing the diffusion model, improving the sampling velocity, designing the diffusion process, and guiding diffusion models. How should one construct the diffusion model? How can one improve the sampling velocity? How should one design the diffusion process? How should one guide diffusion models? These questions are critical to enhancing diffusion model performance. However, most existing review papers focus on applications, while discussion of the four key technical aspects remains limited. In response, this paper summarizes four key technologies and six representative application directions. First, the basic principles of diffusion models are reviewed from three perspectives: denoising diffusion probabilistic models, noise conditional score network models, and stochastic differential equation models. Second, key techniques for improving sampling velocity are summarized from three perspectives: non-Markovian sampling, knowledge distillation sampling, and discrete optimization sampling. Third, the diffusion process design is summarized from three perspectives: latent space, Transformer-based diffusion, and non-Euclidean space. Fourth, guidance strategies are summarized from three perspectives: classifier guidance, classifier-free guidance, and multimodal guidance. Fifth, the advantages and applications of diffusion models are discussed in high-quality text-to-image generation, high-quality text-to-video generation, and high-quality image-to-image generation. Finally, this paper discusses the challenges faced by diffusion models in image generation. Overall, this review systematically discusses the four “secrets” of diffusion models for image generation and provides a useful reference for future research in this field.
- New
- Research Article
- 10.18372/1990-5548.88.20960
- Apr 18, 2026
- Electronics and Control Systems
- Andrew Sheruda
Automatic generation of clinical reports from medical images is a relevant task capable of reducing the workload of radiologists and standardizing documentation. In this paper, we investigate an approach to generating structured reports from brain MRI data using a pre-trained multimodal SigLIP2 model as a feature extractor. We propose an architecture in which visual embeddings obtained from a frozen SigLIP2 are projected into the representation space of the GPT-2 language model for subsequent text generation. Experiments were conducted on the open-access BIOSE MRI dataset, containing 34 pairs of "MRI image + clinical report". It is shown that the proposed approach generates semantically meaningful reports, achieving quality comparable to more complex architectures with substantially lower computational costs. Additionally, the influence of pre-training SigLIP2 on a classification task (Brain3-Anomaly-SigLIP2 version) on generation quality is investigated. The results demonstrate the potential of using frozen vision encoders in medical generative tasks under data-scarce conditions.
- New
- Research Article
- 10.1021/acs.molpharmaceut.5c01705
- Apr 16, 2026
- Molecular pharmaceutics
- Zijie Gu + 6 more
Messenger RNA (mRNA) technology holds great promise in biomedicine; however, its therapeutic efficacy relies heavily on translation efficiency and the in vivo stability of mRNA sequences, both of which are governed by dynamic interactions between the coding and noncoding regions. Existing optimization approaches typically focus on individual sequence components and lack the capacity for global, synergistic optimization. Moreover, their performance is often limited when processing long sequences or supporting iterative sequence refinement. To overcome these challenges, we propose a unified and interpretable in silico framework for full-length mRNA sequence design, integrating CDS optimization, UTR generation, and degradation modeling. In silico case studies based on respiratory syncytial virus (RSV) vaccine design demonstrated that mRNASyner achieved a favorable balance between translational accessibility and structural stability. mRNASyner thus enables the design of full-length mRNA sequences, supports long-sequence optimization, and offers a novel solution for the development of personalized mRNA therapeutics.