Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Application of Large Language Models in Geotechnical Engineering: A Movement Towards Safe and Sustainable Future

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Over the last two decades, there has been a paradigm shift in geotechnical engineering driven by advances in sensing, communication, and data-driven techniques. These advancements enhanced the safety and reliability of geotechnical infrastructure through real-time monitoring and automated decision-making. In recent times, Large Language Models (LLMs) have emerged as advanced data-driven techniques contributing to automated risk assessment of geotechnical infrastructure. LLMs are advanced deep learning models widely used to solve complex numerical problems, analyze large volumes of data, and generate human language. This paper presents a critical review of the application of LLM in geotechnical engineering. The integration of LLMs into geotechnical engineering has demonstrated significant advances in slope stability analysis, bearing capacity computation, numerical analysis, soil–structure interaction, and underground infrastructure. By summarizing the latest research findings and practical applications, this research paper underscores the potential of LLMs to advance and automate various processes in geotechnical engineering. The findings presented in this paper not only provide insights into the current LLM-based geotechnical practices but also emphasize the instrumental role that LLM can play in advancing geotechnical engineering, ultimately ensuring a safer and more sustainable future. Lastly, this paper highlights the different LLM capabilities which can be used to empower geotechnical engineers.

Similar Papers
  • Conference Article
  • 10.37308/dfi49.202430102
A New Perspective and Progressive Applications of the Large Language Models in Geotechnical Engineering Field
  • Oct 6, 2024
  • Seok Hyeon Chai

The use of Natural Language Processing (NLP) in the field of engineering seems complex with many applications which require calculations and engineering judgements. Large Language Models (LLM) have been successfully presented to be one of the most effective solutions in many specific fields such as arts and literature, to finance, to even software development. However, the use of LLM in the field of Civil Engineering and specifically in Geotechnical Engineering has not been discussed often compared to the other fields of expertise that utilize the functionalities of Application Programming Interfaces (API) of OpenAI. This paper aims to cover the details of the framework and breakdown of how LLMs can be used to train a domain-specific dataset, specifically related to settlement analysis, ground improvement, to slope stability analysis and provide results that are generated from a fine-tuned model with the dataset of research papers and technical documents provided by geotechnical engineers. The paper presents three folds: introduction for LLM models with pre-trained models, and choice of the models used. Then, the paper discusses the fine-tuning process of the LLM model with a domain-specific database. The paper provides benchmarks and metrics for performance of the model using the technique called Recall-Oriented Understudy for Gisting Evaluation (ROGUE) for validation of the outputs by comparing the result with prompt-completion pairs. With sufficient data for the prompt-completion pairs from the experts in the field and also from the scientific literature and localized technical documents, the LLM model trained with data related to the geotechnical engineering domain provides reasonable responses that otherwise would not have been achieved with the original ChatGPT LLM model. This paper introduces innovative way to manage current database of geotechnical documents and literature that can be optimized to be used with LLMs. The paper provides discussion of the comparative results and the future potential capabilities of its fine-tuned model.

  • Supplementary Content
  • 10.1108/ir-02-2025-0074
Large language and vision-language models for robot: safety challenges, mitigation strategies and future directions
  • Jul 29, 2025
  • Industrial Robot: the international journal of robotics research and application
  • Xiangyu Hu + 1 more

Purpose This study aims to explore the integration of large language models (LLMs) and vision-language models (VLMs) in robotics, highlighting their potential benefits and the safety challenges they introduce, including robustness issues, adversarial vulnerabilities, privacy concerns and ethical implications. Design/methodology/approach This survey conducts a comprehensive analysis of the safety risks associated with LLM- and VLM-powered robotic systems. The authors review existing literature, analyze key challenges, evaluate current mitigation strategies and propose future research directions. Findings The study identifies that ensuring the safety of LLM-/VLM-driven robots requires a multi-faceted approach. While current mitigation strategies address certain risks, gaps remain in real-time monitoring, adversarial robustness and ethical safeguards. Originality/value This study offers a structured and comprehensive overview of the safety challenges in LLM-/VLM-driven robotics. It contributes to ongoing discussions by integrating technical, ethical and regulatory perspectives to guide future advancements in safe and responsible artificial intelligence-driven robotics.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.geoai.2025.100036
Perspectives: LLM agents reshaping the foundation of geotechnical problem-solving
  • Sep 1, 2025
  • Geodata and AI
  • Stephen Wu + 8 more

• Discuss potential of Agentic AI application in Geotechnics. • Review the 1st Geotechathon event. • Emphasize the importance of community effort for the future of Geotechnics. This paper explores the transformative potential of Large Language Model (LLM)-based agentic artificial intelligence (AI) in addressing longstanding challenges in geotechnical engineering. It begins by highlighting the significant growth and increasing interest in applying machine learning (ML) and AI techniques across various geotechnical domains, such as soil classification, slope stability analysis, and foundation design. Emphasizing the Gartner Hype Cycle, the authors reflect on the transition from initial enthusiasm toward realistic appraisal and adoption, highlighting current barriers like limited foundational understanding, skepticism about AI reliability, and a lack of standardized practices. The authors then introduce LLM agents as promising solutions for automating the extraction, interpretation, and quantification of qualitative and semi-quantitative geotechnical data. Drawing insights from the 1st GeoTechathon event, an international collaboration involving engineers, data scientists, and AI practitioners, the paper demonstrates practical applications in geotechnical site planning, landslide investigations, liquefaction analysis, and shield tunnel safety evaluation. Each project leveraged basic techniques, including Retrieval-Augmented Generation (RAG), multimodal data integration, and prompt engineering, achieving improvements in efficiency, accuracy, and decision-making processes. The paper concludes by discussing broader implications for interdisciplinary collaboration, ethical considerations, and future directions, emphasizing the necessity for standardized practices, rigorous validation, and enhanced AI literacy to sustainably integrate LLM technologies within the geotechnical engineering community.

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.procs.2023.09.086
A Large and Diverse Arabic Corpus for Language Modeling
  • Jan 1, 2023
  • Procedia Computer Science
  • Abbas Raza Ali + 3 more

A Large and Diverse Arabic Corpus for Language Modeling

  • Research Article
  • Cite Count Icon 21
  • 10.1093/jamia/ocae090
The first step is the hardest: pitfalls of representing and tokenizing temporal data for large language models.
  • Jul 1, 2024
  • Journal of the American Medical Informatics Association : JAMIA
  • Dimitris Spathis + 1 more

Large language models (LLMs) have demonstrated remarkable generalization and across diverse tasks, leading individuals to increasingly use them as personal assistants due to their emerging reasoning capabilities. Nevertheless, a notable obstacle emerges when including numerical/temporal data into these prompts, such as data sourced from wearables or electronic health records. LLMs employ tokenizers in their input that break down text into smaller units. However, tokenizers are not designed to represent numerical values and might struggle to understand repetitive patterns and context, treating consecutive values as separate tokens and disregarding their temporal relationships. This article discusses the challenges of representing and tokenizing temporal data. It argues that naively passing timeseries to LLMs can be ineffective due to the modality gap between numbers and text. We conduct a case study by tokenizing a sample mobile sensing dataset using the OpenAI tokenizer. We also review recent works that feed timeseries data into LLMs for human-centric tasks, outlining common experimental setups like zero-shot prompting and few-shot learning. The case study shows that popular LLMs split timestamps and sensor values into multiple nonmeaningful tokens, indicating they struggle with temporal data. We find that preliminary works rely heavily on prompt engineering and timeseries aggregation to "ground" LLMs, hinting that the "modality gap" hampers progress. The literature was critically analyzed through the lens of models optimizing for expressiveness versus parameter efficiency. On one end of the spectrum, training large domain-specific models from scratch is expressive but not parameter-efficient. On the other end, zero-shot prompting of LLMs is parameter-efficient but lacks expressiveness for temporal data. We argue tokenizers are not optimized for numerical data, while the scarcity of timeseries examples in training corpora exacerbates difficulties. We advocate balancing model expressiveness and computational efficiency when integrating temporal data. Prompt tuning, model grafting, and improved tokenizers are highlighted as promising directions. We underscore that despite promising capabilities, LLMs cannot meaningfully process temporal data unless the input representation is addressed. We argue that this paradigm shift in how we leverage pretrained models will particularly affect the area of biomedical signals, given the lack of modality-specific foundation models.

  • Research Article
  • Cite Count Icon 26
  • 10.1080/17499518.2024.2381026
Future-proofing geotechnics workflows: accelerating problem-solving with large language models
  • Jul 25, 2024
  • Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards
  • Stephen Wu + 18 more

The integration of Large Language Models (LLMs), such as ChatGPT, into the workflows of geotechnical engineering has a high potential to transform how the discipline approaches problem-solving and decision-making. This paper investigates the practical uses of LLMs in addressing geotechnical challenges based on opinions from a diverse group, including students, researchers, and professionals from academia, industry, and government sectors gathered from a workshop dedicated to this study. After introducing the key concepts of LLMs, we present preliminary LLM solutions for four distinct practical geotechnical problems as illustrative examples. In addition to the basic text generation ability, each problem is designed to cover different extended functionalities of LLMs that cannot be achieved by conventional machine learning tools, including multimodal modelling under a unified framework, programming ability, knowledge extraction, and text embedding. We also address the potentials and challenges in implementing LLMs, particularly in achieving high precision and accuracy in specialised tasks, and underscore the need for expert oversight. The findings demonstrate the effectiveness of LLMs in enhancing efficiency, data processing, and decision-making in geotechnical engineering, suggesting a paradigm shift towards more integrated, data-driven approaches in this field.

  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.compgeo.2024.106849
GeoLLM: A specialized large language model framework for intelligent geotechnical design
  • Oct 24, 2024
  • Computers and Geotechnics
  • Hao-Ruo Xu + 3 more

GeoLLM: A specialized large language model framework for intelligent geotechnical design

  • Research Article
  • Cite Count Icon 1
  • 10.7498/aps.74.20250497
Material design accelerated by large language models: end-to-end empowerment from knowledge mining to intelligent design
  • Jan 1, 2025
  • Acta Physica Sinica
  • Yudan Huang + 8 more

<sec>With the rapid development of artificial intelligence technology, large language models (LLMs) have become the core driving force for the paradigm shift in materials science research. This review explores the comprehensive role of LLMs in accelerating material design throughout the entire research lifecycle from knowledge mining to intelligent design. This work aims to emphasize how LLMs can leverage their advantages in information retrieval, cross-modal data integration, and intelligent reasoning to address challenges in traditional materials research, such as data fragmentation, high experimental costs, and limited reasoning capabilities.</sec><sec>Key methods include applying LLMs to knowledge discovery through techniques such as retrieval-augmented generation (RAG), multi-modal information retrieval, and knowledge graph construction. These approaches can efficiently extract and construct material data from a vast repository of scientific literature and experimental records. Additionally, LLMs are integrated with automated experimental platforms to optimize workflows from natural language-driven experiment design to high-throughput iterative testing.</sec><sec>The results demonstrate that LLMs significantly enhance material research efficiency and accuracy. For instance, in knowledge mining, LLMs improve information retrieval accuracy by up to 29.4% in tasks such as predicting material synthesis conditions. In material design, LLMs can accelerate computational modeling, structure and performance prediction, and reverse engineering, reducing experimental trial-and-error cycles. Notably, LLMs perform well in cross-scale knowledge integration, linking material composition, processing parameters, and performance metrics to guide innovative synthesis pathways.</sec><sec>However, challenges still exist, including dependence on high-quality data, the “black-box” nature of LLMs, and limitations in handling complex material systems. The future direction emphasizes improving data quality through multi-source integration, enhancing model explainability through visualization tools, and deepening interdisciplinary collaboration, and bridging the gaps between AI and domain-specific expertise.</sec><sec>In summary, LLMs are reshaping materials science by implementing a data-driven, knowledge-intensive research paradigms. The ability of LLMs to integrate vast datasets, predict material properties, and automate experimental workflows makes them indispensable tools for accelerating material discovery and innovation. With the development of LLMs, their synergistic effect with physical constraints and experimental platforms is expected to open new fields in material design.</sec>

  • Research Article
  • Cite Count Icon 1
  • 10.1111/cogs.70106
Can Large Language Models Simulate Spoken Human Conversations?
  • Sep 1, 2025
  • Cognitive Science
  • Eric Mayor + 2 more

Large language models (LLMs) can emulate many aspects of human cognition and have been heralded as a potential paradigm shift. They are proficient in chat‐based conversation, but little is known about their ability to simulate spoken conversation. We investigated whether LLMs can simulate spoken human conversation. In Study 1, we compared transcripts of human telephone conversations from the Switchboard (SB) corpus to six corpora of transcripts generated by two powerful LLMs, GPT‐4 and Claude Sonnet 3.5, and two open‐source LLMs, Vicuna and Wayfarer, using different prompts designed to mimic SB participants’ instructions. We compared LLM and SB conversations in terms of alignment (conceptual, syntactic, and lexical), coordination markers, and coordination of openings and closings. We also documented qualitative features by which LLM conversations differ from SB conversations. In Study 2, we assessed whether humans can distinguish transcripts produced by LLMs from those of SB conversations. LLM conversations exhibited exaggerated alignment (and an increase in alignment as conversation unfolded) relative to human conversations, different and often inappropriate use of coordination markers, and were dissimilar to human conversations in openings and closings. LLM conversations did not consistently pass for SB conversations. Spoken conversations generated by LLMs are both qualitatively and quantitatively different from those of humans. This issue may evolve with better LLMs and more training on spoken conversation, but may also result from key differences between spoken conversation and chat.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/s10462-025-11328-1
Content moderation by LLM: from accuracy to legitimacy
  • Jul 19, 2025
  • Artificial Intelligence Review
  • Tao Huang

One trending application of LLM (large language model) is to use it for content moderation in online platforms. Most current studies on this application have focused on the metric of accuracy—the extent to which LLMs make correct decisions about content. This article argues that accuracy is insufficient and misleading because it fails to grasp the distinction between easy cases and hard cases, as well as the inevitable trade-offs in achieving higher accuracy. Closer examination reveals that content moderation is a constitutive part of platform governance, the key to which is to gain and enhance legitimacy. Instead of making moderation decisions correctly, the chief goal of LLMs is to make them legitimate. In this regard, this article proposes a paradigm shift from the single benchmark of accuracy towards a legitimacy-based framework for evaluating the performance of LLM moderators. The framework suggests that for easy cases, the key is to ensure accuracy, speed, and transparency, while for hard cases, what matters is reasoned justification and user participation. Examined under this framework, LLMs’ real potential in moderation is not accuracy improvement. Rather, LLMs can better contribute in four other aspects: to conduct screening of hard cases from easy cases, to provide quality explanations for moderation decisions, to assist human reviewers in getting more contextual information, and to facilitate user participation in a more interactive way. To realize these contributions, this article proposes a workflow for incorporating LLMs into the content moderation system. Using normative theories from law and social sciences to critically assess the new technological application, this article seeks to redefine LLMs’ role in content moderation and redirect relevant research in this field.

  • Research Article
  • Cite Count Icon 5
  • 10.1002/aidi.202500085
Large Language Model in Materials Science: Roles, Challenges, and Strategic Outlook
  • Jul 22, 2025
  • Advanced Intelligent Discovery
  • Jinglan Zhang + 4 more

Large language models (LLMs) are creating a new paradigm for materials science by transforming textual insights into experimental findings. Leveraging their strengths in natural language understanding, multimodal alignment, and few‐shot reasoning, LLMs already show potential in property prediction, synthesis planning, and uncertainty quantification. This perspective highlights four key roles, Oracle, Surrogate, Quant, and Arbiter, to systematize recent advancements of LLMs in knowledge extraction, property inference, risk assessment, and decision‐making. Experience suggests that true value arises from integrating these capabilities into a verifiable, traceable loop rather than merely scaling model size. However, LLMs still face challenges due to data heterogeneity, limited interpretability, hallucination control, and misalignment with scientific tasks. To address these issues, we propose three forward‐looking directions: developing domain‐adapted foundation models infused with materials science context, establishing a standardized cross‐modal data infrastructure, and incorporating expert feedback alongside robotic automated experimentation into a fully traceable research loop. Through enhanced human–AI collaboration and methodological innovation, LLMs can transform from general‐purpose language tools into scientifically aware partners, advancing materials discovery toward a more efficient, interpretable, and sustainable future.

  • Research Article
  • 10.1108/dts-11-2025-242
Editorial: Integrating large language models and operations research for digital transformation and societal governance
  • Nov 7, 2025
  • Digital Transformation and Society
  • Kang Li

In contemporary society, the integration of large language models (LLMs) and operations research (OR) is subtly transforming decision-making, governance and resource allocation. LLMs, rooted in deep learning and big data, bring flexibility and contextual intelligence, while OR contributes mathematical rigor, optimization and systemic clarity. Their convergence represents not only a technological leap but also a catalyst for societal progress and governance innovation. The essence of this synergy lies in combining the interpretive power of LLMs with the precision of OR models to tackle increasingly complex social challenges. Its significance is reflected in several domains: (1) enhancing decision quality and efficiency; (2) balancing personalization and collective equity in service delivery; and (3) improving transparency and legitimacy of digital governance. Across the globe, the joint application of LLMs and OR is receiving growing scholarly and practical attention, marking an emerging frontier of digital transformation (Bommasani, 2021; Xiao et al., 2023; Vial, 2021; Verhoef et al., 2021).As digital societies evolve, multiple governance challenges require new solutions:Traditional OR provides rigorous frameworks but struggles with unstructured and dynamic contexts. LLMs enrich these models by extracting, interpreting and updating social signals, enabling more responsive and resilient governance systems.The joint application of LLMs and OR is proving valuable across a variety of domains:Examples illustrate the potential of this convergence.The deployment of LLM–OR integration in digital transformation exhibits pivotal trends.The convergence of LLMs and OR stands at the frontier of digital transformation, reshaping governance, business and society. Their synergy fosters decision-making that is rigorous yet adaptive, personalized yet collective, efficient yet legitimate. As challenges multiply in an interconnected world, LLM–OR integration may well become a cornerstone of societal resilience and innovation. We call for critical inquiry, interdisciplinary research and international collaboration to guide this emerging paradigm toward inclusive and sustainable futures.

  • Research Article
  • Cite Count Icon 2
  • 10.1044/2025_ajslp-24-00400
Large Language Models' Ability to Assess Main Concepts in Story Retelling: A Proof-of-Concept Comparison of Human Versus Machine Ratings.
  • Mar 31, 2025
  • American journal of speech-language pathology
  • Jacquie Kurland + 6 more

Despite an abundance of manual, labor-intensive discourse analysis methods, there remains a dearth of clinically convenient, psychometrically robust instruments to measure change in real-world communication in aphasia. The Brief Assessment of Transactional Success (BATS) addresses this gap while developing automated methods for analyzing story retelling discourse. This study investigated automation of main concept (MC) analysis of stories by comparing scores from three large language models (LLMs) to those of human raters. After watching/listening to each of the eight short video/audio BATS stimuli and retelling each story, 96 persons with aphasia (PWA; n = 48 female) engaged in topic-constrained conversations over Zoom with 94 familiar and 107 unfamiliar conversation partners (CPs). CPs then retold each story as co-constructed during their conversations with PWA. Audio files from the resulting 1,760 story retells were transcribed using Python and AssemblyAI's speech-to-text application programming interface. Each MC was first scored by human raters for presence, accuracy, and completeness. Raters used a semiautomated application, MainConcept. For each transcript, an MC composite ratio score was obtained. We evaluated three state-of-the-art LLMs: two proprietary models, GPT-4 and GPT-4o, and one open-source model, Llama-3-70B. The interrater reliability between each LLM versus human MC scoring was assessed via the Pearson correlation coefficient and reliability coefficients based on the generalizability theory (G-theory). The Pearson correlation coefficients indicate strong positive linear relationships between LLM and human MC scores. G-theory reliability coefficients also indicate reliable scoring between LLM and human scoring across the spectrum of participants and conditions. This promising proof-of-concept study affirms the reliability of three LLMs in evaluating BATS story retell MCs and justifies ongoing investigation into their use. Providing clinicians and clinical researchers with automated tools for analyzing discourse without the need for prohibitively labor-intensive manual scoring could be a paradigm shift, potentially revolutionizing the aphasia intervention landscape.

  • Research Article
  • Cite Count Icon 8
  • 10.3389/fpubh.2025.1512537
Deductively coding psychosocial autopsy interview data using a few-shot learning large language model
  • Feb 19, 2025
  • Frontiers in Public Health
  • Elias Balt + 8 more

BackgroundPsychosocial autopsy is a retrospective study of suicide, aimed to identify emerging themes and psychosocial risk factors. It typically relies heavily on qualitative data from interviews or medical documentation. However, qualitative research has often been scrutinized for being prone to bias and is notoriously time- and cost-intensive. Therefore, the current study aimed to investigate if a Large Language Model (LLM) can be feasibly integrated with qualitative research procedures, by evaluating the performance of the model in deductively coding and coherently summarizing interview data obtained in a psychosocial autopsy.MethodsData from 38 semi-structured interviews conducted with individuals bereaved by the suicide of a loved one was deductively coded by qualitative researchers and a server-installed LLAMA3 large language model. The model performance was evaluated in three tasks: (1) binary classification of coded segments, (2) independent classification using a sliding window approach, and (3) summarization of coded data. Intercoder agreement scores were calculated using Cohen’s Kappa, and the LLM’s summaries were qualitatively assessed using the Constant Comparative Method.ResultsThe results showed that the LLM achieved substantial agreement with the researchers for the binary classification (accuracy: 0.84) and the sliding window task (accuracy: 0.67). The performance had large variability across codes. LLM summaries were typically rich enough for subsequent analysis by the researcher, with around 80% of the summaries being rated independently by two researchers as ‘adequate’ or ‘good.’ Emerging themes in the qualitative assessment of the summaries included unsolicited elaboration and hallucination.ConclusionState-of-the-art LLMs show great potential to support researchers in deductively coding complex interview data, which would alleviate the investment of time and resources. Integrating models with qualitative research procedures can facilitate near real-time monitoring. Based on the findings, we recommend a collaborative model, whereby the LLM’s deductive coding is complemented by review, inductive coding and further interpretation by a researcher. Future research may aim to replicate the findings in different contexts and evaluate models with a larger context size.

  • Research Article
  • Cite Count Icon 44
  • 10.1016/j.compgeo.2024.106237
A ChatGPT-MATLAB framework for numerical modeling in geotechnical engineering applications
  • Mar 19, 2024
  • Computers and Geotechnics
  • Daehyun Kim + 4 more

A ChatGPT-MATLAB framework for numerical modeling in geotechnical engineering applications

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant