DisasterReliefGPT: Multimodal AI for Autonomous Disaster Impact Assessment and Crisis Communication
DisasterReliefGPT is a multimodal AI system integrating vision, vision-language, and language models for automated disaster impact assessment and crisis communication. It achieves a 78.8% F1-damage score on the xBD dataset, with 81.3% precision for destroyed buildings and 90.7% for undamaged structures, enabling rapid, reliable damage reports for emergency response.
The work presented herein proposes DisasterReliefGPT, a multimodal AI system for automation in the areas of crisis communication and post-disaster assessment. The system integrates three tightly coupled components: a vision module called DisasterOCS for structural damage detection in satellite images, a Large Vision–Language Model (LVLM) for enhanced visual understanding and contextual reasoning, and a Large Language Model (LLM) to produce detailed, clear assessment reports. DisasterOCS relies on a ResNet34-based encoder with partial weight sharing and event-specific decoders, coupled with a custom MultiCrossEntropyDiceLoss function for multi-class segmentation on pre- and post-disaster image pairs. On the benchmark xBD dataset, the developed system reaches a high score of 78.8% in identifying F1-damage, making correct identifications of destroyed buildings with 81.3% precision, while undamaged structures are found with a very high value of 90.7%. From a combination of these components, emergency responders can immediately provide reliable and readable assessments of damage that can be used to directly support urgent decision-making.
- Conference Article
- 10.1145/3711875.3729128
- Jun 23, 2025
While large language models (LLMs) are endowed with broad knowledge, their task-specific performance is often suboptimal. Fine-tuning LLMs with task-specific data from diverse nodes is necessary, but this data is typically safeguarded and not shared publicly due to privacy concerns. A common solution involves downstream nodes downloading the LLM locally and fine-tuning it with their proprietary data. However, owners often regard pre-trained LLMs as valuable assets and are reluctant to share them. Additionally, the significant computational resources required by LLMs make local fine-tuning impractical for many nodes. To mitigate these problems, this paper proposes CrossLM, a data-free collaborative fine-tuning framework for large and small language models. CrossLM enables resource-constrained nodes to train smaller language models (SLMs) using their private task-specific data. These SLMs are subsequently leveraged to promote the task-specific natural language generation and understanding capabilities of the LLMs. Simultaneously, the SLMs of nodes also benefit from enhancement by the fine-tuned LLMs. In this way, CrossLM avoids sharing private data and proprietary LLMs, and also reduces the resource requirements of nodes. Through extensive experiments across a range of benchmark tasks and popular language models, we demonstrate that CrossLM significantly boosts the task-specific performance of both LLMs and SLMs while preserving the generalization capabilities of LLMs.
- Conference Article
16
- 10.1109/icicis46948.2019.9014842
- Dec 1, 2019
Using the deep Convolution Neural Networks (CNNs) for Object detection in satellite images accomplish promising results, especially for large objects. While Small objects detection in the same spatial resolution images does not attain the same results. For instance, vehicle detection in high-resolution satellite images, the targeted object maybe existed in an area that does not exceed 15 square pixels, which will not make a sufficient effect in the deeper layers. In addition; the interfering with the surrounding background, noise effect, the neighboring object's shadows, and various vehicle colors. In the proposed paper, an analysis study is performed to evaluate the effect of changing the object size on the detection results. A separate resampling algorithm is applied to the input test images to change its size - bear in mind the built-in detection model resampling layer-, which results in changing the object size, and accordingly extends the object impact in deep layers. Through Transfer Learning, the Faster R-CNN pre-trained object detection model with Inception-V2is applied to submeter satellite images and passenger vehicles as the target objects. The Experimental results show the change in detection accuracy with the change of the object size.
- Research Article
3
- 10.1109/access.2024.3419079
- Jan 1, 2024
- IEEE Access
Large language models’ exceptional all-purpose abilities have made human-computer conversations normal, but for particular industries and verticals, they fall short of enhancing the expertise of knowledge and the timeliness of information. In order to give current information, and provide improved search capabilities, large language models need to increasingly incorporate specialist resources and databases. In this research, a model for intelligent assisted decision-making was proposed that the model incorporates knowledge from domain-specific databases and real-time data and uses large language models to offer expert tax guidance. The research proposed to overcome the limits of general-purpose language models and deliver specialized advise for tax-related inquiries by complementing large language models with domain-specific information.The results we achieve demonstrate that by offering tax advice tailored to a given situation, and the model we proposed goes beyond the validity of general large language language models. Our contribution is that not only exploring the combination of tax area and large language model, but also proposing a new effective model for government tax department to use in real life. This study highlights the potential of big language models for use in real-world professional domains and advances the field of domain-specific human-computer interaction.
- Research Article
- 10.1016/j.imavis.2026.105944
- Mar 1, 2026
- Image and Vision Computing
Autonomous Vehicles (AVs) are transforming the future of transportation through advances in intelligent perception, decision-making, and control systems. However, their success is tied to one core capability, reliable object detection in complex and multimodal environments. While recent breakthroughs in Computer Vision (CV) and Artificial Intelligence (AI) have driven remarkable progress, the field still faces a critical challenge as knowledge remains fragmented across multimodal perception, contextual reasoning, and cooperative intelligence. This survey bridges that gap by delivering a forward-looking analysis of object detection in AVs, emphasizing emerging paradigms such as Vision-Language Models (VLMs), Large Language Models (LLMs), and Generative AI rather than re-examining outdated techniques. We begin by systematically reviewing the fundamental spectrum of AV sensors (camera, ultrasonic, LiDAR, and Radar) and their fusion strategies, highlighting not only their capabilities and limitations in dynamic driving environments but also their potential to integrate with recent advances in LLM/VLM-driven perception frameworks. We also review autonomous vehicle simulators as a critical layer for safe development, scalable testing, and reproducible benchmarking of perception and detection pipelines before real-world deployment. Next, we introduce a structured categorization of AV datasets that moves beyond simple collections, positioning ego-vehicle, infrastructure-based, and cooperative datasets (e.g., V2V, V2I, V2X, I2I), followed by a cross-analysis of data structures and characteristics. Ultimately, we analyze cutting-edge detection methodologies, ranging from 2D and 3D pipelines to hybrid sensor fusion, with particular attention to emerging transformer-driven approaches powered by Vision Transformers (ViTs), Large and Small Language Models (SLMs), and VLMs. By synthesizing these perspectives, our survey delivers a clear roadmap of current capabilities, open challenges, and future opportunities, highlighting underexplored avenues such as multimodal reasoning, cooperative perception, and foundation-model integration. We aim to establish this work as a definitive reference for researchers, practitioners, and developers, fostering accelerated innovation toward safer and more intelligent autonomous driving systems. • Comprehensive review of state-of-the-art object detection in autonomous vehicles. • Analysis of latest AV sensors, fusion strategies, and multimodal perception systems. • Novel categorization and comparison of ego-vehicle, roadside, and CP datasets. • In-depth evaluation of 2D, 3D, fusion, and emerging LLM/VLM-based detection methods. • Highlights open challenges and potential advancements in AV perception research.
- Conference Article
4
- 10.1109/siu.2017.7960727
- May 1, 2017
Ozet—In this paper, a novel solution to the problem of unsupervised change detection in bitemporal satellite images is presented. Information measures, which are well-known and commonly-used in the change detection literature, result in unsharp change maps and masks without well defined boundaries as a result of local computation. In the proposed method, mutual information with local joint distributions computed within the over-segments after image registration, radiometric correction and some preprocessing steps are observed to eliminate the problem of sharpness. Results, which are presented comparatively with fundamental approaches, show that the change masks obtained by the proposed method are convenient for different application areas, such as damage assesment of man made structures after natural disasters, and/or urban planning.
- Conference Article
9
- 10.1109/adcons.2013.27
- Dec 1, 2013
The paper discusses wide variety of ways in which multispectral satellite images are being utilized in coastline and river detection. Flooding is a major problem in which causes distraction to the natural resources. River detection in satellite images is useful in flood monitoring, tracing sedimentation along the river bank and tracking dry outs of the major rivers. Coastline detection is an important for coastline zone monitoring, extraction and analysis of coastline changes which are caused by gradual washing out of sand or by abrupt natural calamity. The proposed work presents an approach for detecting rivers and coastlines over water bodies by the Level Set (LS) Approach and Chan Vese (CV) algorithm. CV approach was initially designed for the medical imaging. In the proposed work CV method is modified with respect to the contour smoothening parameters and time step which further improves the algorithm accuracy for the river and coastline detection. Based on the experimental results we compared LS segmentation method with the modified CV model both subjectively and objectively. For objective analysis measures like Dice coefficient, computation time and Hausdorff Distance are used.
- Book Chapter
- 10.1201/9781003559115-65
- Jan 29, 2025
The aim of research focused on automated vehicle detection in satellite images using the R-CNN algorithm is to develop accurate and efficient methods for identifying and localizing vehicles in high-resolution satellite imagery. This research seeks to address the growing demand for reliable vehicle detection solutions with applications spanning urban planning, environmental monitoring, disaster response, and defense. We used the R-CNN algorithm and the CNN algorithm in this investigation, each of which has ten iterations (N=10). Two different groups evaluated these two methods, and 100 samples in all were taken into account for the analysis. A power setting of 85% was used for the Gpower statistical test (g power parameters setup with α=0.05 and power=0.85). This power configuration was chosen in accordance with accepted statistical criteria to guarantee that the investigation had a strong capacity to identify statistically significant differences or effects. In the context of automated vehicle detection in satellite photos, our research's results show that the custom CNN model outperformed expectations with an exceptional accuracy of 97.81%. As an illustration of our CNN model's significant performance advantage, the R-CNN model received an accuracy score of 94.84%. In conclusion, the research and application of automated vehicle detection in satellite images using the R-CNN (Region-based Convolutional Neural Network) algorithm represent a significant leap in the field of computer vision and remote sensing. This technology has far-reaching implications across numerous domains, from urban planning and environmental monitoring to disaster response, defense, and beyond. R-CNN and its variants have demonstrated remarkable accuracy in detecting and localizing vehicles within high-resolution satellite imagery.
- Discussion
2
- 10.1111/cogs.13430
- Mar 1, 2024
- Cognitive Science
This letter explores the intricate historical and contemporary links between large language models (LLMs) and cognitive science through the lens of information theory, statistical language models, and socioanthropological linguistic theories. The emergence of LLMs highlights the enduring significance of information-based and statistical learning theories in understanding human communication. These theories, initially proposed in the mid-20th century, offered a visionary framework for integrating computational science, social sciences, and humanities, which nonetheless was not fully fulfilled at that time. The subsequent development of sociolinguistics and linguistic anthropology, especially since the 1970s, provided critical perspectives and empirical methods that both challenged and enriched this framework. This letter proposes that two pivotal concepts derived from this development, metapragmatic function and indexicality, offer a fruitful theoretical perspective for integrating the semantic, textual, and pragmatic, contextual dimensions of communication, an amalgamation that contemporary LLMs have yet to fully achieve. The author believes that contemporary cognitive science is at a crucial crossroads, where fostering interdisciplinary dialogues among computational linguistics, social linguistics and linguistic anthropology, and cognitive and social psychology is in particular imperative. Such collaboration is vital to bridge the computational, cognitive, and sociocultural aspects of human communication and human-AI interaction, especially in the era of large language and multimodal models and human-centric Artificial Intelligence (AI).
- Conference Article
6
- 10.18653/v1/2024.findings-acl.365
- Jan 1, 2024
Despite the ubiquity of large language models (LLMs) in AI research, the question of embodiment in LLMs remains underexplored, distinguishing them from embodied systems in robotics where sensory perception directly informs physical action.Our investigation navigates the intriguing terrain of whether LLMs, despite their non-embodied nature, effectively capture implicit human intuitions about fundamental, spatial building blocks of language.We employ insights from spatial cognitive foundations developed through early sensorimotor experiences, guiding our exploration through the reproduction of three psycholinguistic experiments.Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences.Notable distinctions include polarized language model responses and reduced correlations in vision language models.This research contributes to a nuanced understanding of the interplay between language, spatial experiences, and the computations made by large language models.
- Research Article
11
- 10.1287/ijds.2023.0007
- Apr 1, 2023
- INFORMS Journal on Data Science
How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
- Conference Article
135
- 10.1145/3510003.3510203
- May 21, 2022
Large pre-trained language models such as GPT-3 [10], Codex [11], and Google's language model [7] are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and caution. On the optimistic side, such large language models have the potential to improve productivity by providing an automated AI pair programmer for every programmer in the world. On the cautionary side, since these large language models do not understand program semantics, they offer no guarantees about quality of the suggested code. In this paper, we present an approach to augment these large language models with post-processing steps based on program analysis and synthesis techniques, that understand the syntax and semantics of programs. Further, we show that such techniques can make use of user feedback and improve with usage. We present our experiences from building and evaluating such a tool Jigsaw, targeted at synthesizing code for using Python Pandas API using multi-modal inputs. Our experience suggests that as these large language models evolve for synthesizing code from intent, Jigsaw has an important role to play in improving the accuracy of the systems.
- Research Article
12
- 10.1016/j.procs.2023.09.086
- Jan 1, 2023
- Procedia Computer Science
A Large and Diverse Arabic Corpus for Language Modeling
- Conference Article
3
- 10.1109/icpr.1994.576236
- Oct 9, 1994
International audience
- Research Article
4
- 10.1038/s41698-025-00916-7
- May 23, 2025
- npj Precision Oncology
Large language models (LLMs) and large visual-language models (LVLMs) have exhibited near-human levels of knowledge, image comprehension, and reasoning abilities, and their performance has undergone evaluation in some healthcare domains. However, a systematic evaluation of their capabilities in cervical cytology screening has yet to be conducted. Here, we constructed CCBench, a benchmark dataset dedicated to the evaluation of LLMs and LVLMs in cervical cytology screening, and developed a GPT-based semi-automatic evaluation pipeline to assess the performance of six LLMs (GPT-4, Bard, Claude-2.0, LLaMa-2, Qwen-Max, and ERNIE-Bot-4.0) and five LVLMs (GPT-4V, Gemini, LLaVA, Qwen-VL, and ViLT) on this dataset. CCBench comprises 773 question-answer (QA) pairs and 420 visual-question-answer (VQA) triplets, making it the first dataset in cervical cytology to include both QA and VQA data. We found that LLMs and LVLMs demonstrate promising accuracy and specialization in cervical cytology screening. GPT-4 achieved the best performance on the QA dataset, with an accuracy of 70.5% for close-ended questions and average expert evaluation score of 6.9/10 for open-ended questions. On the VQA dataset, Gemini achieved the highest accuracy for close-ended questions at 67.8%, while GPT-4V attained the highest expert evaluation score of 6.1/10 for open-ended questions. Besides, LLMs and LVLMs revealed varying abilities in answering questions across different topics and difficulty levels. However, their performance remains inferior to the expertise exhibited by cytopathology professionals, and the risk of generating misinformation could lead to potential harm. Therefore, substantial improvements are required before these models can be reliably deployed in clinical practice.
- Supplementary Content
- 10.1108/ir-02-2025-0074
- Jul 29, 2025
- Industrial Robot: the international journal of robotics research and application
Purpose This study aims to explore the integration of large language models (LLMs) and vision-language models (VLMs) in robotics, highlighting their potential benefits and the safety challenges they introduce, including robustness issues, adversarial vulnerabilities, privacy concerns and ethical implications. Design/methodology/approach This survey conducts a comprehensive analysis of the safety risks associated with LLM- and VLM-powered robotic systems. The authors review existing literature, analyze key challenges, evaluate current mitigation strategies and propose future research directions. Findings The study identifies that ensuring the safety of LLM-/VLM-driven robots requires a multi-faceted approach. While current mitigation strategies address certain risks, gaps remain in real-time monitoring, adversarial robustness and ethical safeguards. Originality/value This study offers a structured and comprehensive overview of the safety challenges in LLM-/VLM-driven robotics. It contributes to ongoing discussions by integrating technical, ethical and regulatory perspectives to guide future advancements in safe and responsible artificial intelligence-driven robotics.