Related Topics
Articles published on Multi-modal Models
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
7389 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.ejrai.2026.100089
- Jun 1, 2026
- European Journal of Radiology Artificial Intelligence
- Jeremy Hofmeister + 10 more
Background: Current AI models that rely solely on chest radiographs (CXRs) or clinical and biological data have limitations.Our goal was to determine if incorporating clinical and biological data to CXR data improved accuracy in the diagnosis of pneumonia. J o u r n a l P r e -p r o o fMethods: This retrospective study compared three AI models: an imaging model based on a convolutional neural network (CNN) trained on CXRs alone; a clinico-biological model based on a support vector machine (SVM) using clinical and biological data with no CXR; and a multimodal model integrating all three types of information.Data were extracted from two independent cohorts: a training set (PneumOld-CT, n = 200, median age 84 years (78.6-90.2))and an independent test set (PACSCAN, n = 230, mean age 65 years +/-20) for whom the reference diagnosis was determined a posteriori by a multidisciplinary expert panel using multimodal data.We assessed diagnostic performance by the area under the receiver operating characteristic curve (ROC-AUC) and we compared the models using DeLong's test.Calibration curves and decision curve analysis (DCA) were also evaluated.Results: In the independent test set, the multimodal AI model demonstrated significantly higher ROC-AUC than the imaging-based model (p < 0.05) or the clinico-biological model (p < 0.005).DCA confirmed a greater net clinical benefit for the multimodal model. Conclusion:Integrating radiographic, clinical and biological data to develop a multimodal AI model significantly improved pneumonia diagnosis compared to a single-or a dual modality AI model.This multimodal approach has the potential to improve diagnostic support, especially in complex clinical scenarios.
- New
- Research Article
- 10.1016/j.ejrad.2026.112758
- Jun 1, 2026
- European journal of radiology
- Zhiqiang Wan + 8 more
Multi-modal deep learning model for predicting recurrence of moderately severe and severe acute pancreatitis.
- New
- Research Article
- 10.1016/j.segan.2026.102207
- Jun 1, 2026
- Sustainable Energy, Grids and Networks
- G Cirrincione + 6 more
The worldwide effort to reach carbon peak and neutrality objectives alongside energy market expansion has sped up renewable energy integration, like wind and solar power. The shift towards renewable energy integration introduces substantial uncertainties in power system scheduling and control processes, which test the limits of existing theoretical methods. The advanced reasoning and data-processing capabilities of Large Language Models (LLMs), with particular reference to their ability to analyze multimodal data, provide transformative potential for managing and controlling smart grids. This review examines how LLMs can tackle modern power system challenges while confirming their fit with the power sector’s expanding dependency on Artificial Intelligence (AI) technologies. We assess the requirements of modern power systems for such AI-based solutions, while evaluating how LLMs shape grid management and exploring their enabling technologies, such as model architecture and training methods, along with necessary data. Our review investigates how multimodal LLM technology serves different smart grids’ functions, including generation, transmission, distribution, consumption, and equipment management, to exhibit its adaptable nature in strengthening grid resilience and efficiency. • This review explores the role of multimodal Large Language Models (LLMs) in smart grid management, showing how their ability to integrate and process different types of data, including sensor readings, text logs, weather forecasts, and equipment images, can significantly improve decision-making, fault diagnosis, and operational planning in power systems. • The study analyzes the architectural and training aspects of multimodal LLMs, including the use of pretrained modular encoders, efficient fine-tuning methods such as Low-Rank adaptation (LoRA), and specialized loss functions, highlighting how these techniques enable adaptation to the specific needs of smart grid applications without lengthy retraining. • Practical considerations for industrial implementation are examined, covering multimodal data collection and preprocessing, domain-specific knowledge integration, intelligent task decomposition, and system-level integration, illustrating how LLMs can be seamlessly integrated into power system operating environments. • The review highlights the potential of multimodal LLMs to improve the resilience of the power grid, optimize the integration of renewable energy, and support human-machine collaboration, while outlining future research directions, such as domain-specific base models, physics-based architectures, and human-in-the-loop feedback, in order to further improve reliability and interpretability in critical infrastructure applications.
- New
- Research Article
2
- 10.1016/j.jbi.2026.105017
- Jun 1, 2026
- Journal of biomedical informatics
- Zaifu Zhan + 10 more
Retrieval-augmented in-context learning for multimodal large language models in disease classification.
- New
- Research Article
- 10.1016/j.foodchem.2026.149208
- Jun 1, 2026
- Food chemistry
- Haohan Ding + 8 more
Multimodal large language models for food safety detection within deep learning frameworks: a review.
- New
- Research Article
- 10.1016/j.ejrad.2026.112784
- Jun 1, 2026
- European journal of radiology
- Yangyang Ou + 17 more
Multimodal therapeutic efficacy model for predicting early treatment response to TACE-HAIC combined with immune checkpoint inhibitors and tyrosine kinase inhibitors in unresectable hepatocellular carcinoma.
- New
- Research Article
- 10.1016/j.saa.2026.127623
- Jun 1, 2026
- Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
- Yu Sun + 6 more
A transformer and 3D CNN-based feature fusion network with interpretable ability for Raman spectra analysis: improving the diagnosis of thyroid cancer.
- New
- Research Article
- 10.26599/tst.2025.9010178
- Jun 1, 2026
- Tsinghua Science and Technology
- Qiujing Lu + 6 more
To guarantee the safety and reliability of Autonomous Vehicle (AV) systems, corner cases play a crucial role in exploring the system’s behavior under rare and challenging conditions within simulation environments. However, current approaches often fall short in meeting diverse testing needs and struggle to generalize to novel, high-risk scenarios that closely mirror real-world conditions. To tackle this challenge, we present AutoScenario, a multimodal Large Language Model (LLM)-based framework for realistic corner case generation. It converts safety-critical real-world data from multiple sources into textual representations, enabling the generalization of key risk factors while leveraging the extensive world knowledge and advanced reasoning capabilities of LLMs. Furthermore, it integrates tools from the Simulation of Urban Mobility (SUMO) and Car Learning to Act (CARLA) simulators to automatically interpret and execute the scenario code by LLMs. Our experiments demonstrate that AutoScenario can generate realistic and challenging test scenarios, precisely tailored to specific testing requirements or textual descriptions. Additionally, we validated its ability to produce diverse and novel scenarios derived from multimodal real-world data involving risky situations, harnessing the powerful generalization capabilities of LLMs to effectively simulate a wide range of corner cases. The implementation is available at <a ext-link-type="uri" href="https://github.com/THU-AI-Testing/AutoScenario">https://github.com/THU-AI-Testing/AutoScenario</a>.
- New
- Research Article
- 10.1016/j.micpath.2026.108459
- Jun 1, 2026
- Microbial pathogenesis
- Qinghong Du + 4 more
Virtual cell/virtual cell like: The key to unlocking a new era of HIV/AIDS treatment ?
- New
- Research Article
- 10.1016/j.media.2026.104072
- Jun 1, 2026
- Medical image analysis
- Donggen Fang + 5 more
Incorporating modality-specific intensity prior as text prompt for multimodal myocardial pathology segmentation.
- New
- Research Article
- 10.1016/j.jmsy.2026.04.032
- Jun 1, 2026
- Journal of Manufacturing Systems
- Yue Zhao + 3 more
Intelligent generation of 3D disassembly processes driven by multimodal large language models
- New
- Research Article
- 10.1016/j.neunet.2026.108726
- Jun 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Langtao Zhou + 10 more
Multimodal hybrid mamba classification model for tumor pathological grade prediction using magnetic resonance images.
- New
- Research Article
- 10.1016/j.jad.2026.121259
- Jun 1, 2026
- Journal of affective disorders
- Soonho Ha + 5 more
Multimodal machine learning models for predicting remission in major depressive disorder using clinical data, blood biomarkers, and DNA methylation.
- New
- Research Article
- 10.1016/j.drugalcdep.2026.113128
- Jun 1, 2026
- Drug and alcohol dependence
- Linqi Lu + 9 more
Foodie traps within facebook cannabis promotional posts: Deploying multimodal deep learning AIs to monitor audience engagement.
- New
- Research Article
- 10.1097/mnm.0000000000002125
- Jun 1, 2026
- Nuclear medicine communications
- Yuang Liu + 8 more
This study explored the predictive value of 18 F-fluorodeoxyglucose (FDG) PET/computed tomography (CT) radiomics for assessing programmed death-ligand 1 (PD-L1) expression in non-small cell lung cancer (NSCLC), aiming to noninvasively evaluate PD-L1 status and assist in selecting patients for immunotherapy. We retrospectively analyzed 163 NSCLC patients with pretreatment 18 F-FDG PET/CT scans, randomly assigning them into training ( n = 130) and validation ( n = 33) cohorts. Optimal radiomics features were selected via least absolute shrinkage and selection operator and combined with clinical factors to construct five predictive models: CT, PET, radiomics, clinical, and a combined model. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration curves, and decision curve analysis (DCA). All models showed predictive ability for PD-L1 expression. The combined model demonstrated superior performance, with AUCs of 0.839 [95% confidence interval (CI): 0.771-0.908] in training and 0.782 (95% CI: 0.610-0.954) in validation. Calibration curves indicated good agreement between predicted and observed probabilities (Brier scores: 0.163 and 0.191, respectively). DCA confirmed the highest net clinical benefit for the combined model. The multimodal combined model, integrating PET/CT radiomics with clinical factors, shows significant potential for noninvasively predicting PD-L1 expression in NSCLC, offering a novel strategy for precise patient selection for anti-PD-L1 immunotherapy.
- New
- Research Article
- 10.1016/j.bspc.2026.109737
- Jun 1, 2026
- Biomedical Signal Processing and Control
- Xingyu Zhang + 6 more
A Fine-tuning Multimodal Large Language Model for Endoscopic Report Generation
- New
- Research Article
- 10.1016/j.neunet.2026.108575
- Jun 1, 2026
- Neural networks : the official journal of the International Neural Network Society
- Yang Shao + 5 more
Exploring cognitive workload recognition using CogRepLKNet with EEG-fMRI.
- New
- Research Article
- 10.1016/j.inffus.2025.104115
- Jun 1, 2026
- Information Fusion
- Haoyu Wang + 9 more
A hierarchical information policy fusion framework with multimodal large language models for autonomous guidewire navigation in endovascular procedures
- New
- Research Article
- 10.1016/j.pnpbp.2026.111702
- Jun 1, 2026
- Progress in neuro-psychopharmacology & biological psychiatry
- Simin Kang + 5 more
Predicting adult functional outcomes in childhood-onset attention-deficit/hyperactivity disorder using multimodal MRI and machine learning: A prospective follow-up study.
- New
- Research Article
- 10.1016/j.bspc.2026.109793
- Jun 1, 2026
- Biomedical Signal Processing and Control
- Hermes Javier Mora + 2 more
Advanced forecasting of driver drowsiness events: Non-intrusive data and multimodal BiLSTM-based modeling