Application of reinforcement learning from human feedback for localizing quality agricultural advice using generative AI

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Recent generative AI offers personalized, high-quality advice to smallholder farmers in resource-limited settings. Yet, most large language models (LLMs) lack training data for diverse agroecologies, often yielding generic, inaccurate, or locally misaligned advice. Digital Green adapted Reinforcement Learning from Human Feedback (RLHF) to agricultural advisory to deliver highly localized, relevant, information. This refined tool, called Farmer.Chat, is an AI assistant supporting over 670,000 farmers in India, Kenya, Ethiopia, and Nigeria with text, image, and voice-based content. This paper details Digital Green's RLHF approach: a web-based annotation tool, multi-phase implementation, and quality assurance. Over 25,000 expert-reviewed Q&A pairs yielded significant improvements in response quality, tone, context, and cultural fit, especially for region-specific agricultural queries. The work outlines key lessons, cost/equity, and replication guidance. It calls for researchers, governments, and NGOs to pool validated Q&A data, strengthening global AI systems. Future work explores multimodal RLHF (image, voice, video), aiming to foster a global, inclusive, evidence-based ecosystem for AI agricultural advice.

Similar Papers
  • Research Article
  • Cite Count Icon 9
  • 10.1007/s10676-024-09818-x
Possibilities and challenges in the moral growth of large language models: a philosophical perspective
  • Dec 20, 2024
  • Ethics and Information Technology
  • Guoyu Wang + 9 more

With the rapid expansion of parameters in large language models (LLMs) and the application of Reinforcement Learning with Human Feedback (RLHF), there has been a noticeable growth in the moral competence of LLMs. However, several questions warrant further exploration: Is it really possible for LLMs to fully align with human values through RLHF? How can the current moral growth be philosophically contextualized? We identify similarities between LLMs’ moral growth and Deweyan ethics in terms of the discourse of human moral development. We then attempt to use Dewey’s theory on an experimental basis to examine and further explain the extent to which the current alignment pathway enables the development of LLMs. A beating experiment serves as the foundational case for analyzing LLMs’ moral competence across various parameters and stages, including basic moral cognition, moral dilemma judgment, and moral behavior. The results demonstrate that the moral competence of the GPT series has seen a significant improvement, and Dewey’s Impulse-Habit-Character theory of moral development can be used to explain this: the moral competence of LLMs has been enhanced through experience-based learning, supported by human feedback. Nevertheless, LLMs’ moral development through RLHF remains constrained and does not reach the character stage described by Dewey, possibly due to their lack of self-consciousness. This fundamental difference between humans and LLMs underscores both the limitations of LLMs’ moral growth and the challenges of applying RLHF for AI alignment. It also emphasizes the need for external societal governance and legal regulation.

  • Research Article
  • 10.55041/ijsrem37369
The Future of Smart Home Security: Generative AI and LLMs for Intelligent Event Detection and Personalized Notifications
  • Nov 10, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Sibin Thomas

Abstract—Smart home security cameras are becoming more common, but their usefulness can be diminished by notification fatigue from too many alerts about minor incidents. This paper examines the gaps of existing event detection and notification systems in security cameras and then recommends using Generative AI and Large Language Models (LLMs) to add intelligence which would improve user experience. Generative AI can be leveraged to classify events more accurately and assist with anomaly detection. LLMs can further be used to create notifications that are tailored to the context and personalized to users behavior, helping to reduce notification fatigue and provide meaningful user alerts. The paper also looks into wider applications of these technologies to add intelligence and improve other related experiences like automated video summarization, proactive security measures, and improved privacy controls. The integration of Generative AI and LLMs with smart home security camera systems advances the smart cameras capabilities and offers enhanced security, personalized user experiences. Keywords—Smart home security, Generative AI, Large Language Models (LLMs), Event detection, Anomaly detection, Notification fatigue, Context-aware notifications, Personalized security, Reinforcement Learning from Human Feedback (RLHF), Internet of Things (IoT).

  • Research Article
  • Cite Count Icon 4
  • 10.1145/3700791
AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection
  • Oct 16, 2025
  • ACM Transactions on the Web
  • Arezo Bodaghi + 2 more

Addressing the challenge of toxic language in online discussions is crucial for the development of effective toxicity detection models. This pioneering work focuses on addressing imbalanced datasets in toxicity detection by introducing a novel approach to augment toxic language data. We create a balanced dataset by instructing fine-tuning of Large Language Models (LLMs) using Reinforcement Learning with Human Feedback (RLHF). Recognizing the challenges in collecting sufficient toxic samples from social media platforms for building a balanced dataset, our methodology involves sentence-level text data augmentation through paraphrasing existing samples using optimized generative LLMs. Leveraging generative LLM, we utilize the Proximal Policy Optimizer (PPO) as the RL algorithm to fine-tune the model further and align it with human feedback. In other words, we start by fine-tuning a LLM using an instruction dataset, specifically tailored for the task of paraphrasing while maintaining semantic consistency. Next, we apply PPO and a reward function, to further fine-tune (optimize) the instruction-tuned LLM. This RL process guides the model in generating toxic responses. We utilize the Google Perspective API as a toxicity evaluator to assess generated responses and assign rewards/penalties accordingly. This approach guides LLMs through PPO and the reward function, transforming minority class samples into augmented versions. The primary goal of our methodology is to create a balanced and diverse dataset to enhance the accuracy and performance of classifiers in identifying instances from the minority class. Utilizing two publicly available toxic datasets, we compared various techniques with our proposed method for generating toxic samples, demonstrating that our approach outperforms all others in producing a higher number of toxic samples. Starting with an initial 16,225 toxic prompts, our method successfully generated 122,951 toxic samples with a toxicity score exceeding 30%. Subsequently, we developed various classifiers using the generated balanced datasets and applied a cost-sensitive learning approach to the original imbalanced dataset. The findings highlight the superior performance of classifiers trained on data generated using our proposed method. These results highlight the importance of employing RL and a data-agnostic model as a reward mechanism for augmenting toxic data, thereby enhancing the robustness of toxicity detection models.

  • Research Article
  • Cite Count Icon 5
  • 10.1038/s41598-025-92889-7
A framework for mitigating malicious RLHF feedback in LLM training using consensus based reward
  • Mar 17, 2025
  • Scientific Reports
  • Zafaryab Haider + 4 more

Large Language models (LLMs) have demonstrated impressive capabilities in natural language processing and understanding. LLMs are being rapidly adopted in major industry sectors including mobile computing, healthcare, finance, government, and education driven by technology giants such as NVIDIA, OpenAI, Microsoft, Apple, Meta, Google, Broadcom, AMD, and IBM. However, due to the emerging nature of this technology, many security/privacy challenges remain unresolved that we must tackle before rolling out LLMs to critical applications (e.g. Healthcare, Legal). In this article, we focus on the Reinforcement Learning via Human Feedback (RLHF) process that is widely used for training LLMs giving them the human-like feel most applications value. The RLHF process involves employing human experts to generate feedback based on an LLM’s query-response pairs and using this feedback to then retrain (fine-tune) the model. However, RLHF can also expose the LLM to malicious feedback generated by one or more individuals in the process leading to degraded performance of the LLM and harmful responses. Most state-of-the-art (SOTA) solutions to this problem involve utilizing a KL-Divergence-based brute-force update-rejection approach that can render the whole RLHF process completely useless (model quality is not improved) in the presence of malicious entities in the process. We propose the COnsensus-Based RewArd framework (COBRA), a consensus-based technique that can effectively negate the malicious noise generated by a certain segment of the RLHF human-expert pool, leading to improved LLM training performance in a mixed-trust scenario. We have evaluated COBRA for two separate LLM use cases, Sentiment Analysis and Conversational Task. We have experimented with a wide range of LLM models (e.g. GPT-2 XL - 1.5B parameters). COBRA outperformed the standard unprotected reward generation scheme by for the generative conversational task and by for the sentiment analysis task. We have also quantitatively compared COBRA with Coste et al. and observed state-of-the-art performance, particularly when a lower number of reward models are used ( increased reward accuracy at ).

  • Research Article
  • Cite Count Icon 3
  • 10.1200/jco.2024.42.16_suppl.e13623
Generative AI enhanced with NCCN clinical practice guidelines for clinical decision support: A case study on bone cancer.
  • Jun 1, 2024
  • Journal of Clinical Oncology
  • Yanshan Wang + 3 more

e13623 Background: Bone cancer is a complex and challenging disease to diagnose and treat in clinical practice. Recently, generative AI, especially large language models (LLMs), has demonstrated potential as a decision support tool for cancer. However, most implementations have overlooked the integration of available cancer guidelines, such as the NCCN Bone Cancer Guidelines, in fine-tuning the outputs of generative AI models. Incorporating these guidelines into LLMs presents an opportunity to harness the extensive clinical knowledge they contain and improve the decision-support capabilities of the model. Methods: In this study, the aim is to enhance the LLM with cancer clinical guidelines to enable accurate medical decisions and personalized treatment recommendations. Therefore, we introduce a novel method for incorporating the NCCN Bone Cancer Guidelines into LLMs using a Binary Decision Tree (BDT) approach. The approach involves constructing a BDT based on NCCN Bone Cancer Guidelines, where internal nodes represent decision points from the Guidelines, and leaf node signify final treatment suggestions. Then the LLM makes decision at each internal node, considering a given patient's characteristics, and guides toward a treatment recommendation in the leaf node. To assess the efficacy of Guideline-enhanced LLMs, an oncologist from our team created 11 hypothetical osteosarcoma patients’ medical progress notes. Each note contains their demographics, medical history, current illness, physical exams, diagnostic tests. We tested three LLMs in the implementation (GPT-4, GPT-3.5, and PaLM 2) and compared the LLM-generated treatment recommendations with the gold standard treatment across four runs with different random seeds (random seeds is a setting to control the LLM outputs). The results are reported as the average of four runs. The original LLMs are used as baseline methods for comparison. Results: The table below provides a comparison between the performance of original LLMs and those augmented with cancer guidelines for osteosarcoma treatment recommendations. We can observe that the PaLM 2 model demonstrated superior performance compared to its counterparts, underscoring the effectiveness of integrating cancer guidelines into LLMs for decision support. Conclusions: The clinical decision support capabilities of the LLMs are promising when enhanced by NCCN Bone Cancer Guidelines using our approach. To fully exhibit the potential of our proposed method as a clinical decision support tool, further investigation into other subtypes of bone cancer should be conducted in the future study. [Table: see text]

  • Research Article
  • Cite Count Icon 2
  • 10.1609/aaai.v39i1.32018
Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Haoran Li + 6 more

With rapid advances, generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning. Yet, language models' inherent vulnerabilities may be exacerbated due to increased accessibility and unrestricted model training on massive data. A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data. Backdoored LLMs behave innocuously for normal queries and generate harmful responses when the backdoor trigger is activated. Despite significant efforts paid to LLMs' safety issues, LLMs are still struggling against backdoor attacks. As Anthropic recently revealed, existing safety training strategies, including supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), fail to revoke the backdoors once the LLM is backdoored during the pre-training stage. In this paper, we present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs. We initially propose Overwrite Supervised Fine-tuning (OSFT) for effective backdoor removal when the trigger is known. Then, to handle scenarios where trigger patterns are unknown, we integrate OSFT into our two-stage framework, SANDE. Unlike other works that assume access to cleanly trained models, our safety-enhanced LLMs are able to revoke backdoors without any reference. Consequently, our safety-enhanced LLMs no longer produce targeted responses when the backdoor triggers are activated. We conduct comprehensive experiments to show that our proposed SANDE is effective against backdoor attacks while bringing minimal harm to LLMs' powerful capability.

  • Research Article
  • Cite Count Icon 14
  • 10.1111/bjet.13587
Generative AI and multimodal data for educational feedback: Insights from embodied math learning
  • Apr 21, 2025
  • British Journal of Educational Technology
  • Giulia Cosentino + 5 more

This study explores the role of generative AI (GenAI) in providing formative feedback in children's digital learning experiences, specifically in the context of mathematics education. Using multimodal data, the research compares AI‐generated feedback with feedback from human instructors, focusing on its impact on children's learning outcomes. Children engaged with a digital body‐scale number line to learn addition and subtraction of positive and negative integers through embodied interaction. The study followed a between‐group design, with one group receiving feedback from a human instructor and the other from GenAI. Eye‐tracking data and system logs were used to evaluate student's information processing behaviour and cognitive load. The results revealed that while task‐based performance did not differ significantly between conditions, the GenAI feedback condition demonstrated lower cognitive load and students show different visual information processing strategies among the two conditions. The findings provide empirical support for the potential of GenAI to complement traditional teaching by providing structured and adaptive feedback that supports efficient learning. The study underscores the importance of hybrid intelligence approaches that integrate human and AI feedback to enhance learning through synergistic feedback. This research offers valuable insights for educators, developers and researchers aiming to design hybrid AI‐human educational environments that promote effective learning outcomes. Practitioner notes What is already known about this topic? Embodied learning approaches have been shown to facilitate deeper cognitive processing by engaging students physically with learning materials, which is especially beneficial in abstract subjects like mathematics. GenAI has the potential to enhance educational experiences through personalized feedback, making it crucial for fostering student understanding and engagement. Previous research indicates that hybrid intelligence that combines AI with human instructors can contribute to improved educational outcomes. What this paper adds? This study empirically examines the effectiveness of GenAI‐generated feedback when compared to human instructor feedback in the context of a multisensory environment (MSE) for math learning. Findings from system logs and eye‐tracking analysis reveal that GenAI feedback can support learning effectively, particularly in helping students manage their cognitive load. The research uncovers that GenAI and teacher feedback lead to different information processing strategies. These findings provide actionable insights into how feedback modality influences cognitive engagement. Implications for practice and/or policy The integration of GenAI into educational settings presents an opportunity to enhance traditional teaching methods, enabling an adaptive learning environment that leverages the strengths of both AI and human feedback. Future educational practices should explore hybrid models that incorporate both AI and human feedback to create inclusive and effective learning experiences, adapting to the diverse needs of learners. Policymakers should establish guidelines and frameworks to facilitate the ethical and equitable adoption of GenAI technologies for learning. This includes addressing issues of trust, transparency and accessibility to ensure that GenAI systems are effectively supporting, rather than replacing, human instructors.

  • Research Article
  • 10.1609/aaai.v39i24.34784
Look Before You Leap: Enhance Attention and Vigilance Regarding Harmful Content with GuidelineLLM
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Shaoqing Zhang + 6 more

Despite being empowered with alignment mechanisms, large language models (LLMs) are increasingly vulnerable to emerging jailbreak attacks that can compromise their alignment mechanisms. This vulnerability poses significant risks to real-world applications. Existing work faces challenges in both training efficiency and generalization capabilities (i.e., Reinforcement Learning from Human Feedback and Red-Teaming). Developing effective strategies to enable LLMs to resist continuously evolving jailbreak attempts represents a significant challenge. To address this challenge, we propose a novel defensive paradigm called GuidelineLLM, which assists LLMs in recognizing queries that may have harmful content. Before LLMs respond to a query, GuidelineLLM first identifies potential risks associated with the query, summarizes these risks into guideline suggestions, and then feeds these guidelines to the responding LLMs. Importantly, our approach eliminates the necessity for additional safety fine-tuning of the LLMs themselves; only the GuidelineLLM requires fine-tuning. This characteristic enhances the general applicability of GuidelineLLM across various LLMs. Experimental results demonstrate that GuidelineLLM can significantly reduce the attack success rate (ASR) against LLM (an average reduction of 34.17% ASR) while maintaining the usefulness of LLM in handling benign queries.

  • Research Article
  • Cite Count Icon 48
  • 10.1093/jamia/ocae074
Large language models for biomedicine: foundations, opportunities, challenges, and best practices.
  • Apr 24, 2024
  • Journal of the American Medical Informatics Association : JAMIA
  • Satya S Sahoo + 8 more

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.

  • Research Article
  • 10.55041/ijsrem46621
How Generative AI Can Improve Enterprise Data Management
  • Apr 28, 2025
  • INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Vivek Prasanna Prabu

Generative AI is reshaping the enterprise technology landscape, offering intelligent automation, insight generation, and contextual understanding capabilities that redefine how businesses handle data. Enterprise data management (EDM) - once constrained by rigid architectures, manual processing, and fragmented governance - can now evolve into a dynamic, self-improving ecosystem through the integration of generative AI. With organizations generating petabytes of data from operations, customer interactions, supply chains, and IoT devices, the need for scalable and intelligent data handling systems has never been greater. Generative AI models, including large language models (LLMs) and multimodal transformers, provide new tools for data ingestion, cleansing, integration, transformation, synthesis, and summarization. By applying generative AI to enterprise data workflows, companies can enhance metadata enrichment, automate data cataloging, improve data lineage tracking, and simplify data governance. These capabilities increase data discoverability, trust, and compliance—core principles of modern data management. Additionally, generative AI supports natural language querying, automates report writing, and generates synthetic data for training and simulation, boosting data availability and operational speed. While generative AI brings immense promise, it also raises concerns around hallucination, model transparency, data privacy, and regulatory compliance. Ensuring responsible AI adoption requires rigorous validation, bias mitigation, and alignment with existing data governance policies. Nonetheless, enterprises that embrace generative AI can unlock superior decision-making, improve productivity, and democratize data access across technical and non-technical users. This white paper explores the opportunities, challenges, architectural considerations, and best practices for embedding generative AI into enterprise data management. Through industry examples and forward- looking analysis, it offers a roadmap for transforming data operations and maximizing enterprise intelligence in the era of AI. Keywords: Generative AI, Enterprise Data Management, LLMs, Data Governance, Metadata, Data Cataloging, Synthetic Data, Data Lineage, Natural Language Processing, Responsible AI

  • Research Article
  • Cite Count Icon 9
  • 10.9781/ijimai.2024.02.008
A Cybernetic Perspective on Generative AI in Education: From Transmission to Coordination.
  • Mar 1, 2024
  • International Journal of Interactive Multimedia and Artificial Intelligence
  • Dai Griffiths + 3 more

The recent sudden increase in the capabilities of Large Language Models (LLMs), and generative AI in general, has astonished education professionals and learners. In formulating a response to these developments, educational institutions are constrained by a lack of clarity concerning human-machine communication and its relationship to models of education. Ideas and models from the cybernetic tradition can help to fill this gap. Two paradigms are distinguished: (1) the transmission paradigm (combining the model of learning implied by the instruments and processes of formal education and the conduit model of communication), and (2) the coordination paradigm (combining the constructivist model of learning and the coordination model of communication). It is proposed that these paradigms have long coexisted in educational practice in a modus vivendi, which is disrupted by LLMs. If an LLM can pass an examination, then from within the transmission paradigm this can only understood as demonstrating that the LLM has indeed learned and understood the material being assessed. At the same time, we know that LLMs do not in fact have the capacity to learn and understand, but rather generate a simulacrum of intelligence. It is argued that this paradox prevents educational institutions from formulating a coherent response to generative AI systems. However, within the coordination paradigm the interactions of LLMs and education institutions can be more easily understood and can be situated in a conversational model of learning. These distinctions can help institutions, educational leaders, and teachers, to frame the complex and nuanced questions raised by GenAI, and to chart a course towards its effective use in education. More specifically, they indicate that to benefit fully from the capabilities of generative AI education institutions need to recognize the validity of the coordination paradigm and adapt their processes and instruments accordingly.

  • Research Article
  • Cite Count Icon 47
  • 10.1111/liv.15974
Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes.
  • May 31, 2024
  • Liver international : official journal of the International Association for the Study of the Liver
  • Mauro Giuffrè + 4 more

Large Language Models (LLMs) are transformer-based neural networks with billions of parameters trained on very large text corpora from diverse sources. LLMs have the potential to improve healthcare due to their capability to parse complex concepts and generate context-based responses. The interest in LLMs has not spared digestive disease academics, who have mainly investigated foundational LLM accuracy, which ranges from 25% to 90% and is influenced by the lack of standardized rules to report methodologies and results for LLM-oriented research. In addition, a critical issue is the absence of a universally accepted definition of accuracy, varying from binary to scalar interpretations, often tied to grader expertise without reference to clinical guidelines. We address strategies and challenges to increase accuracy. In particular, LLMs can be infused with domain knowledge using Retrieval Augmented Generation (RAG) or Supervised Fine-Tuning (SFT) with reinforcement learning from human feedback (RLHF). RAG faces challenges with in-context window limits and accurate information retrieval from the provided context. SFT, a deeper adaptation method, is computationally demanding and requires specialized knowledge. LLMs may increase patient quality of care across the field of digestive diseases,where physicians are often engaged in screening, treatment and surveillance for a broad range of pathologies for which in-context learning or SFT with RLHF could improve clinical decision-making and patient outcomes. However, despite their potential, the safe deployment of LLMs in healthcare still needs to overcome hurdles in accuracy, suggesting a need for strategies that integrate human feedback with advanced model training.

  • Research Article
  • 10.1609/aaai.v39i26.34957
Exploring Intrinsic Alignments Within Text Corpus
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Zi Liang + 9 more

Recent years have witnessed rapid advancements in the safety alignments of large language models (LLMs). Methods such as supervised instruction fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) have thus emerged as vital components in constructing LLMs. While these methods achieve robust and fine-grained alignment to human values, their practical application is still hindered by high annotation costs and incomplete human alignments. Besides, the intrinsic human values within training corpora have not been fully exploited. To address these issues, we propose ISAAC (Intrinsically Supervised Alignments by Assessing Corpus), a primary and coarse-grained safety alignment strategy for LLMs. ISAAC only relies on a prior assumption about the text corpus, and does not require preferences in RLHF or human responses selection in SFT. Specifically, it assumes a long-tail distribution of text corpus and employs a specialized sampling strategy to automatically sample high-quality responses. Theoretically, we prove that this strategy can improve the safety of LLMs under our assumptions. Empirically, our evaluations on mainstream LLMs show that ISAAC achieves a safety score comparable to current SFT solutions. Moreover, we conduct experiments on ISAAC for some RLHF-based LLMs, where we find that ISAAC can even improve the safety of these models under specific safety domains. These findings demonstrate that ISAAC can provide preliminary alignment to LLMs, thereby reducing the construction costs of existing human-feedback-based methods.

  • Conference Article
  • Cite Count Icon 2
  • 10.2118/221883-ms
Domain Driven Methodology Adopting Generative AI Application in Oil and Gas Drilling Sector
  • Nov 4, 2024
  • Daria Ponomareva + 5 more

In dynamic landscape of oil and gas drilling, Generative Artificial Intelligence (Generative AI) emerges as the indispensable ally, leveraging historical drilling data to revolutionize operational efficiency, mitigate risks, and empower informed decision-making. Existing Generative AI methods and tools, such as Large Language Models (LLMs) and agents, require tuning and customization to the oil and gas drilling sector. Applying Generative AI in drilling confronts hurdles such as ensuring data quality and navigating the complexity of operations. A methodology integrating Generative AI into drilling demands is comprehensive and interdisciplinary. Agile strategy revolves around constructing a network of specialized agents of LLMs, meticulously crafted to understand industry-specific terminology and intricate operational relationships rooted in drilling domain expertise. Every agent is linked to manuals, standards, specific operational drilling data source and it has unique instructions optimizing computational efficiency and driving cost savings. Moreover, to ensure cost-effectiveness, LLMs are selectively employed, while repetitive user inquiries are addressed through data retrieval from an aggregated storage. Consistent responses to user queries are provided through text and graphs revealing insights from drilling operations, standards, manuals, practices, and lessons learned. Applied methodology efficiently navigates inside the pre-processed user database relying on custom agents developed. Communication with the user is set in the form of chat framed within a web application, and queries on the database about hundreds of wells are answered in less than a minute. Methodology can analyze data and graphs by comparing Key Performance Indicators (KPIs). A wide range of graph output is represented by bar charts, scatter plots, and maps, including self-explaining charts like Time versus Depth Curve (TVD) with Non-Productive Time (TVD) events marked with details underneath. Understanding the data content, data preparation steps, and user needs is fundamental to a successful methodology application. The proposed Generative AI methodology is not just a tool for data interpretation, but a catalyst for real-time decision-making in complex drilling environments. Its integration into oil and gas drilling operations signifies a pivotal advancement, showcasing its transformative potential in revolutionizing the industry's landscape. This approach leads to notable cost reductions, improved resource utilization, and increased productivity, paving the way for a new era in drilling operations. A method driven by selective, cost-effective, and domain specific LLM agents stands poised to revolutionize drilling operations, seamlessly integrating generative AI to amplify efficiency and propel informed decision-making within the oil and gas drilling sector.

  • Research Article
  • 10.32629/rerr.v6i1.1619
Contextual panel conditioning and reward models in large language models
  • Feb 22, 2024
  • Region - Educational Research and Reviews
  • Muyuan Wen

Direct preference optimization (DPO) aims to match human preferences while reducing the complexity of reinforcement learning. Traditional methods such as reinforcement learning with human feedback (RLHF) first match reward models with cues and preferences, and then use reinforcement learning (RL) to find policies that maximize rewards. In contrast, DPO simplifies the process by directly optimizing the policy to satisfy preferences without explicit reward functions or RL processes. DPO is a more direct and potentially more efficient way to fine-tune a language model to remain consistent with human feedback. Additionally, OpenAI mentioned that they trained the model by imitating human ratings to help improve RLHF. The next step is to fit the model to a data set containing rich "conditions". For example, the training model generates a panel containing memories, conditions, goals, plans, and future tasks, and uses this panel for training. These conditions transform the "creative writing task" into the task of "distributing materials", reducing entropy in creative writing. Conditional reinforcement learning fine-tuning (C-RLFT) enables large language models to understand and generate human-like text, adapt to new information, and personalize responses while maintaining relevance and coherence. Future improvements include improving conditional panels using RLHF or RLAIF, iteration between datasets and models, aligning models with real-world needs, and building new base models based on 0-order optimization. These directions aim to make large language models more efficient, consistent with human preferences, and able to run in a variety of environments, including edge computing devices. Hello, here is some text without a meaning. This text should show what a printed text will look like at this place. If you read this text, you will get no information. Really? Is there no information? Is there a difference between this text and some nonsense like "Huardest gefburn"? Kjift – not at all! A blind text like this gives you information about the selected font, how the letters are written and an impression of the look. This text should contain all letters of the alphabet and it should be written in the original language. There is no need for special content, but the length of words should match the language.

Save Icon
Up Arrow
Open/Close