Translation Titans, Reasoning Challenges: Satisfiability-Aided Language Models for Detecting Conflicting Requirements

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Detecting conflicting requirements early in the software development lifecycle is crucial to mitigating risks of system failures and enhancing overall reliability. While Large Language Models (LLMs) have demonstrated proficiency in natural language understanding tasks, they often struggle with the nuanced reasoning required for identifying complex requirement conflicts. This paper introduces a novel framework, SAT-LLM, which integrates Satisfiability Modulo Theories (SMT) solvers with LLMs to enhance the detection of conflicting software requirements. SMT solvers provide rigorous formal reasoning capabilities, complementing LLMs' proficiency in natural language understanding. By synergizing these strengths, SAT-LLM aims to overcome the limitations of standalone LLMs in handling intricate requirement interactions. The early experiments provide empirical evidence supporting the effectiveness of our SAT-LLM over pure LLM-based methods like ChatGPT in identifying and resolving conflicting requirements. These findings lay a foundation for further exploration and refinement of hybrid approaches that integrate NLP techniques with formal reasoning methodologies to address complex challenges in software development.

Similar Papers
  • Research Article
  • Cite Count Icon 4
  • 10.14483/23448393.11616
Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla
  • Sep 12, 2017
  • Ingeniería
  • Juan David Celis Nuñez + 4 more

Context: Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta's dialect.Method: in this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.Results: we obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100 % accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.Conclusions: The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.

  • Conference Article
  • Cite Count Icon 7
  • 10.21437/interspeech.2004-488
Statistical feature language model
  • Oct 4, 2004
  • Salma Jamoussi + 3 more

Statistical language models are widely used in automatic speech recognition in order to constrain the decoding of a sentence. Most of these models derive from the classical n-gram paradigm. However, the production of a word dends on a large set of linguistic features : lexical, syntactic, semantic, etc. Moreover, in some natural languages the gender and number of the left context affect the production of the next word. Therefore, it seems attractive to design a language model based on a variety of word features. We present in this paper a new statistical language model, called Statistical Feature Language Model, SFLM, based on this idea. In SFLM a word is considered as an array of linguistic features, and the model is defined in a way similar to the n-gram model. Experiments carried out for French and show an improvement in terms of perplexity and predicted words.

  • Book Chapter
  • 10.1007/978-981-10-6496-8_27
Microblog Search Method Based on Neural Network Language Model
  • Sep 21, 2017
  • Jincai Lai + 3 more

Deep neural network language model has gained significant development among natural language processing (NLP) in recent years. In this paper, we focused on using neural language model (NNLM) to enhance microblog search. This paper proposed a microblog search method based on neural network language model (NBSM). Firstly, we train neural network language model based on microblog data, so as to get the distributed representation of words which may contain internal express model of microblog. Then, we use the distributed representation of words to get the expanding words of users’ searching words. Finally, we re-rank microblog search results combining deep sematic text similarity and social signal features. The method we proposed can effectively obtain microblog express model, and its search result can reflect the social hot-topics of the topic related to users searching words. Experiment results show that the proposed method yields significant improvements over state-of-arts methods and significantly improves the user’s search experience.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icecta.2017.8251935
Exploring the language modeling toolkits for Arabic text
  • Nov 1, 2017
  • Fawaz S Al-Anzi + 1 more

Statistical N-grams language models (LMs) have shown to be very effective in natural language processing (NLP), particularly in automatic speech recognition (ASR) and machine translation. In fact, the successful impact of LMs promote to introduce efficient techniques as well as different types models in various linguistic applications. The LMs mainly include two types that are grammars and statistical language models that is also called N-grams. The main difference between grammars and statistical language models is that the statistical language models are based on the estimation of probabilities for words sequences while the grammars usually do not have probabilities. Despite there are many toolkits that are used to create LMs, however, this work employs two well-known language modeling toolkits with focus on the Arabic text. The implementing toolkits include the Carnegie Mellon University (CMU)-Cambridge Language Modeling Toolkit and the Cambridge University Hidden Markov Model Toolkit (HTK) language modeling toolkits. For clarification, we used a small Arabic text corpus to compute the N-grams for 1-gram, 2-gram, and 3-gram. In addition, this paper demonstrates the intermediate steps that are needed to generate the ARPA-format LMs using both toolkits.

  • Research Article
  • Cite Count Icon 51
  • 10.1111/epi.17570
Are AI language models such as ChatGPT ready to improve the care of individuals with epilepsy?
  • Mar 13, 2023
  • Epilepsia
  • Christian M Boßelmann + 2 more

Epilepsy is a neurological disorder characterized by recurrent seizures, which can significantly impact the quality of life of affected individuals. Fortunately, advances in artificial intelligence (AI) are providing new opportunities to improve the diagnosis and treatment of epilepsy. Briefly, examples of ongoing epilepsy-related AI research include (1) algorithms that can analyze large amounts of electroencephalography (EEG) time-series data to label interictal epileptiform discharges both independently and with human supervision,1, 2 (2) diagnostic biomedical imaging with automated magnetic resonance imaging (MRI)–based lesion detection, surgical decision-making support, and outcome prediction,3, 4 and (3) Clinical Decision Support Systems (CDSS) that use patient data to provide physicians with recommendations based on up-to-date evidence and guidelines for an, overall, improved diagnostic and therapeutic accuracy.5, 6 Language models are often used in chatbots and other conversational systems to generate context-aware human-like text in response to an input prompt from a user. Such models are trained on large data sets of human conversations using machine learning (ML) techniques to learn the patterns and structure of natural language. Various artificial intelligence (AI) language models have been developed since the 1950s, but significant advances have only been made in recent years due to improved ML models paired with an increased availability of large amounts of data and computational resources. Some of the earliest examples of such models include ELIZA, developed in the 1960s (one of the first programs to simulate a patient-doctor relationship), and SHRDLU from the 1970s (a program able to emulate dialogue around a simplified world with a limited number of objects, the "blocks world").7, 8 However, these early language models were inherently limited in their capabilities and could perform only a narrow range of tasks. In recent years, more complex, large language models have led to significant progress in natural language processing. Several of these AI language models can be used for dialogue, for example, (1) GPT-3 (Generative Pre-trained Transformer 3), a state-of-the-art language model developed by OpenAI that can generate contextual human-like text for a wide range of applications, including dialogues9; (2) DialoGPT, a language model developed by Microsoft that is trained on a large data set of social media comment chains and can generate responses in single-turn conversations10; (3) Meena, a sensible and specific language model developed by Google that is trained on human–human conversations from public-domain social media and can generate responses that are coherent and contextually appropriate11; and (4) XLNet, a language model developed by Google and Carnegie Mellon University that is capable of several language modeling tasks including question answering, natural language inference, sentiment analysis, and document ranking; and many others.12 Mainly such algorithms enable the analysis of free-text electronic medical records and other written materials (e.g., test results and treatment plans) that are otherwise inaccessible without preprocessing and standardization. By analyzing large amounts of free-text medical records, language models can learn to identify and summarize relevant patterns. Possible outcomes are information on identified hierarchical patient subgroups based on seizure patterns, documented treatment options, and outcome parameters.13-15 This structured information could be queried to provide personalized treatment recommendations based on medical history and other relevant factors. For example, by identifying early candidates for epilepsy surgery, language models can help minimize treatment delays and improve patient outcomes.16, 17 Another example of how language models can improve health care are Clinical Decision Support Systems (CDSS) trained to understand and offer natural responses to queries from health care providers. CDSS can provide medical or surgical treatment recommendations, suggest relevant clinical guidelines or protocols, and alert health care providers to potential errors or risks. Similar methods may be used to create virtual assistants for individuals with epilepsy to answer questions and provide easy access to information about their condition, treatment options, and other related topics, including driving, causes of premature death (including sudden unexpected death in epilepsy [SUDEP]), and status epilepticus.18, 19 Overall, AI language models have the future potential to significantly improve the care and management of individuals with epilepsy by providing natural conversational interfaces to both patients and physicians, allowing for easy access to structured information. We tested ChatGPT (ChatGPT Dec 15 Version, available at chat.openai.com, last accessed 01/07/2023 at 9:30 p.m.) for some of the use cases outlined above and provided the prompts used and model responses in Figure 1. First, we assumed the role of an individual with epilepsy taking levetiracetam. The model correctly responded that aggression is a possible side effect and recommended follow-up with the prescribing physician (Figure 1A).20 We then requested an Acute Seizure Action Plan (ASAP), a structured treatment plan used to guide patients and caregivers in the event of an epileptic seizure. The model provided a reasonable first draft in line with expert recommendations (Figure 1B).21 We found this useful to quickly generate general patient-facing informational content, but note that each ASAP should be subject to human review to screen for misinformation, and to personalize the draft to include additional information from the individual's medical history and seizure types. We proceeded to present the model with a short, simplified case study of an individual with treatment-resistant left mesial temporal lobe epilepsy. Of interest, the model correctly integrated the medical history and diagnostic findings, noting that hippocampal sclerosis presents an epileptogenic lesion before proceeding to recommend epilepsy surgery. Although this assessment represents a simplification of phase I presurgical evaluation findings and surgical strategies, the overall recommendation is sound.22 However, limitations became apparent when we informed the model that the previously discussed patient now had additional evidence of right temporal lobe seizure onset. Although the initial response is still appropriate, the following advice is actively harmful (Figure 1D). The model confidently states that the patient's health care team may consider bilateral temporal lobectomy or removal of both temporal lobes and the adjacent frontal and parietal lobes (a procedure incorrectly defined as "hemispherotomy" by the model). Finally, even simple queries for structured information may fail if it concerns particularly specialized or disputed areas of knowledge. In Figure 1E, we queried if there is a relationship between variants in SCN9A and autosomal dominant epilepsy. The positive response was incorrect, likely due to misinformation in the academic literature present in the model's training data. Any relationship between variants in SCN9A and epilepsy has been refuted.23, 24 Previous research, as outlined above, has focused on language models trained on large amounts of public-domain data of general human conversations, commonly involving text messages from social media sites (Twitter, Reddit, Facebook, etc.) and some additional training data from books or academic literature. Indeed, the use cases shown above do not accurately represent the limits of this tool, as it was likely not trained on a sufficiently extensive, high-quality, domain-specific data set. It is important to note that language models cannot easily deal with disputed areas of knowledge and may not provide correct answers when contradictions are present in the input data. In light of these general considerations and the specific use cases outlined above, we argue that oversight from medical professionals will be needed to distill training information, and that all current AI applications need to be utilized in combination with human expertise. This is made immediately relevant by the fact that the broad ethical and legal implications of generative models are subjects of ongoing debate, with developers denying liability that may then fall onto the clinician user. Another important limitation of language models is an issue coined "hallucination," which describes confidently formulated answers with incorrect or nonsensical content.25 This misinformation is a result of biased training data or mismatches between token encoding and concept representation, and it is particularly difficult to identify. Finally, users should be aware that language models show bias against individuals based on gender, race, or disability.26 This issue is particularly sensitive in epilepsy, where stigma is still prevalent.27 Extraction of structured information from electronic medical records and assistance with simple human-supervised tasks are feasible use-case scenarios. However, these systems will need to be thoroughly tested and rigorously validated before they can be used in clinical care, in line with existing regulations on Software as a Medical Device or AI/ML-Enabled Medical Devices.28 Ultimately, AI language models in epilepsy care will depend on developing robust and reliable systems as per the Ethics Guidelines for Trustworthy Artificial Intelligence,29 driven by community-based data sharing and epilepsy-specific AI research. Outside of the clinical care of patients, several successful applications of language models (e.g., smart data processing, content generation, and sentiment analysis) provide a promising perspective of AI-augmented future clinical practice. To achieve similar success stories with AI language models in epilepsy and general clinical practice, we will need to develop protocols for applying decentralized language learning models (i.e., using federated learning) on distributed identifiable patient data from multiple institutions. These coordinated decentralized language models will take advantage of the collective knowledge and insights of multiple sources, including specialty fields like epilepsy, while protecting patient privacy. We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines. Christian M Boßelmann: Conceptualization, Writing – original draft; Costin Leu: Writing – review & editing; Dennis Lal: Writing – review & editing, Supervision. None. The authors report no conflicts of interest.

  • Research Article
  • Cite Count Icon 1
  • 10.1145/2422256.2422274
Improving the effectiveness of language modeling approaches to information retrieval
  • Dec 21, 2012
  • ACM SIGIR Forum
  • Yuanhua Lv

Improving the effectiveness of language modeling approaches to information retrieval

  • Conference Article
  • Cite Count Icon 41
  • 10.1109/slt.2018.8639699
Transliteration Based Approaches to Improve Code-Switched Speech Recognition Performance
  • Dec 1, 2018
  • Jesse Emond + 4 more

Code-switching is a commonly occurring phenomenon in many multilingual communities, wherein a speaker switches between languages within a single utterance. Conventional Word Error Rate (WER) is not sufficient for measuring the performance of code-mixed languages due to ambiguities in transcription, misspellings and borrowing of words from two different writing systems. These rendering errors artificially inflate the WER of an Automated Speech Recognition (ASR) system and complicate its evaluation. Furthermore, these errors make it harder to accurately evaluate modeling errors originating from code-switched language and acoustic models. In this work, we propose the use of a new metric, transliteration-optimized Word Error Rate (toWER) that smoothes out many of these irregularities by mapping all text to one writing system and demonstrate a correlation with the amount of code-switching present in a language. We also present a novel approach to acoustic and language modeling for bilingual code-switched Indic languages using the same transliteration approach to normalize the data for three types of language models, namely, a conventional n-gram language model, a maximum entropy based language model and a Long Short Term Memory (LSTM) language model, and a state-of-the-art Connectionist Temporal Classification (CTC) acoustic model. We demonstrate the robustness of the proposed approach on several Indic languages from Google Voice Search traffic with significant gains in ASR performance up to 10% relative over the state-of-the-art baseline.

  • Research Article
  • Cite Count Icon 13
  • 10.11591/ijece.v10i2.pp2102-2109
Improving the role of language model in statistical machine translation (Indonesian-Javanese)
  • Apr 1, 2020
  • International Journal of Electrical and Computer Engineering (IJECE)
  • Herry Sujaini

The statistical machine translation (SMT) is widely used by researchers and practitioners in recent years. SMT works with quality that is determined by several important factors, two of which are language and translation model. Research on improving the translation model has been done quite a lot, but the problem of optimizing the language model for use on machine translators has not received much attention. On translator machines, language models usually use trigram models as standard. In this paper, we conducted experiments with four strategies to analyze the role of the language model used in the Indonesian-Javanese translation machine and show improvement compared to the baseline system with the standard language model. The results of this research indicate that the use of 3-gram language models is highly recommended in SMT.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icassp.2019.8682606
A Unified Framework for Feature-based Domain Adaptation of Neural Network Language Models
  • May 1, 2019
  • Michael Hentschel + 4 more

An important task for language models is the adaptation of general-domain models to specific target domains. For neural network-based language models, feature-based domain adaptation has been a popular method in previous research. Conventional methods use an adaptation feature providing context information that is calculated from a topic model. However, such a topic model needs to be trained separately from the language model. To unify the language and context model training, we present an approach that combines an extractor network and a domain adaptation layer. The extractor network learns a context representation from a fixed-size window of past words and provides the context information for the adaptation layer. The benefit of our method is that the extractor network can be trained jointly with the language model in a single training step. Our proposed method showed superior performance over conventional domain adaptation with topic features on a dataset of TED talks with respect to perplexity and word error rate after 100-best rescoring.

  • Research Article
  • Cite Count Icon 4
  • 10.28932/jutisi.v6i2.2684
Building Acoustic and Language Model for Continuous Speech Recognition in Bahasa Indonesia
  • Aug 10, 2020
  • Jurnal Teknik Informatika dan Sistem Informasi
  • Vincent Elbert Budiman + 1 more

Here a development of an Acoustic and Language Model is presented. Low Word Error Rate is an early good sign of a good Language and Acoustic Model. Although there are still parameters other than Words Error Rate, our work focused on building Bahasa Indonesia with approximately 2000 common words and achieved the minimum threshold of 25% Word Error Rate. There were several experiments consist of different cases, training data, and testing data with Word Error Rate and Testing Ratio as the main comparison. The language and acoustic model were built using Sphinx4 from Carnegie Mellon University using Hidden Markov Model for the acoustic model and ARPA Model for the language model. The models configurations, which are Beam Width and Force Alignment, directly correlates with Word Error Rate. The configurations were set to 1e-80 for Beam Width and 1e-60 for Force Alignment to prevent underfitting or overfitting of the acoustic model. The goals of this research are to build continuous speech recognition in Bahasa Indonesia which has low Word Error Rate and to determine the optimum numbers of training and testing data which minimize the Word Error Rate.

  • Conference Article
  • Cite Count Icon 5
  • 10.3115/1075168.1075172
The state of the art in language modeling
  • Jan 1, 2003
  • Joshua Goodman

This tutorial will cover the state-of-the-art in language modeling. Language models give the probability of word sequences, i.e. recognize is much more probable than wreck a nice beach. While most widely known for their use in speech recognition, language models are useful in a large number of areas, including information retrieval, machine translation, handwriting recognition, context-sensitive spelling correction, and text entry for Chinese and Japanese or on small input devices. Many language modeling techniques can be applied to other areas or to modeling any discrete sequence. This tutorial should be accessible to anyone with a basic knowledge of probability.The most basic language models -- n-gram models -- essentially just count occurrences of words in training data. I will describe five relatively simple improvements over this baseline: smoothing, caching, skipping, sentence-mixture models, and clustering. I will talk a bit about the applications of language modeling and then I will quickly describe other recent promising work, and available tools and resources.I will begin by describing conventional-style language modeling techniques.• Smoothing addresses the problem of data sparsity: there is rarely enough data to accurately estimate the parameters of a language model. Smoothing gives a way to combine less specific, more accurate information with more specific, but noisier data. I will describe two classic techniques -- deleted interpolation and Katz (or Good-Turing) smoothing -- and one recent technique, Modified Kneser-Ney smoothing, which is the best known.• Caching is a widely used technique that uses the observation that recently observed words are likely to occur again. Models from recently observed data can be combined with more general models to improve performance.• Skipping models use the observation that even words that are not directly adjacent to the target word contain useful information.• Sentence-mixture models use the observation that there are many different kinds of sentences. By modeling each sentence type separately, performance is improved.• Clustering is one of the most useful language modeling techniques. Words can be grouped together into clusters through various automatic techniques; then the probability of a cluster can be predicted instead of the probability of the word. Clustering can be used to make smaller models or better performing ones. I will talk briefly about clustering issues specific to the huge amounts of data used in language modeling (hundreds of millions of words) to form thousands of clusters.I will then talk about other language modeling applications, with an emphasis on information retrieval, but also mentioning spelling correction, machine translation, and entering text in Chinese or Japanese.I will briefly describe some recent successful techniques, including Bellegarda's work using latent semantic analysis and Wang's SuperARV language models. Finally, I will also talk about some practical aspects of language modeling. I will describe how freely available, off-the-shelf tools can be used to easily build language models, where to get data to train a language model, and how to use methods such as count cutoffs or relative-entropy techniques to prune language models.Those who attend the tutorial should walk away with a broad understanding of current language modeling techniques, and the background needed to build their own language models, and choose the right language model techniques for their applications.

  • Conference Article
  • Cite Count Icon 26
  • 10.1109/icassp.2019.8683481
Improvements to N-gram Language Model Using Text Generated from Neural Language Model
  • May 1, 2019
  • Masayuki Suzuki + 4 more

Although neural language models have emerged, n-gram language models are still used for many speech recognition tasks. This paper proposes four methods to improve n-gram language models using text generated from a recurrent neural network language model (RNNLM). First, we use multiple RNNLMs from different domains instead of a single RNNLM. The final n-gram language model is obtained by interpolating generated n-gram models from each domain. Second, we use subwords instead of words for RNNLM to reduce the out-of-vocabulary rate. Third, we generate text templates using an RNNLM for template-based data augmentation for named entities. Fourth, we use both forward RNNLM and backward RNNLM to generate text. We found that these four methods improved performance of speech recognition up to 4% relative in various tasks.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.cose.2024.103947
Fuzzing JavaScript engines with a syntax-aware neural program model
  • Jun 8, 2024
  • Computers & Security
  • Haoran Xu + 5 more

Fuzzing JavaScript engines with a syntax-aware neural program model

  • Conference Article
  • Cite Count Icon 6
  • 10.23919/spa.2017.8166885
Polish language modelling for speech recognition application
  • Sep 1, 2017
  • Piotr Klosowski

The article presents statistical word-based and phoneme-based language models for automatic speech recognition application in Polish. Appropriate orthographic and phonemic language corpora allow to perform statistical analysis of the language and to develop statistical word-based and phoneme-based language models. Development of statistical language models helps to predict a sequence of recognized words and phonemes. Developed statistical language models have been compared and the best of them has been proposed as the best suited for automatic speech recognition application for Polish. Word-based and phoneme-based language models can be used to develop hybrid language models and effectively contribute to improve speech recognition effectiveness based on statistical methods. The achieved research results and conclusions can also be applied to speech recognition application for other languages.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/icassp.2002.5743835
Rescoring effectiveness of language models using different levels of knowledge and their integration
  • May 1, 2002
  • Wen Wang + 2 more

In this paper, we compare the efficacy of a variety of language models (LMs) for rescoring word graphs and N-best lists generated by a large vocabulary continuous speech recognizer. These LMs differ based on the level of knowledge used (word, lexical features, syntax) and the type of integration of that knowledge (tight or loose). The trigram LM incorporates word level information; our Part-of-Speech (POS) LM uses word and lexical class information in a tightly coupled way; our new SuperARV LM tightly integrates word, a richer set of lexical features than POS, and syntactic dependency information; and the Parser LM integrates some limited word information, POS, and syntactic information. We also investigate LMs created using a linear interpolation of LM pairs. When comparing each LM on the task of rescoring word graphs or N-best lists for the Wall Street Journal (WSJ) 5k- and 20k- vocabulary test sets, the SuperARV LM always achieves the greatest reduction in word error rate (WER) and the greatest increase in sentence accuracy (SAC). On the 5k test sets, the SuperARV LM obtains more than a 10% relative reduction in WER compared to the trigram LM, and on the 20k test set more than 2%. Additionally, the SuperARV LM performs comparably to or better than the interpolated LMs. Hence, we conclude that the tight coupling of knowledge from all three levels is an effective method of constructing high quality LMs.

Save Icon
Up Arrow
Open/Close