How Linguistics Learned to Stop Worrying and Love the Language Models.
Language models can produce fluent, grammatical text. Nonetheless, some maintain that language models don't really learn language and also that, even if they did, that would not be informative for the study of human learning and processing. On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure. We argue that both extremes are wrong. LMs can contribute to fundamental questions about linguistic structure, language processing, and learning. They force us to rethink arguments and ways of thinking that have been foundational in linguistics. While they do not replace linguistic structure and theory, they serve as model systems and working proofs of concept for gradient, usage-based approaches to language. We offer an optimistic take on the relationship between language models and linguistics.
- Research Article
14
- 10.1098/rsta.2000.0588
- Apr 15, 2000
- Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
Restricted accessMoreSectionsView PDF ToolsAdd to favoritesDownload CitationsTrack Citations ShareShare onFacebookTwitterLinked InRedditEmail Cite this article Rosenfeld Ronald 2000Incorporating linguistic structure into statistical language modelsPhil. Trans. R. Soc. A.3581311–1324http://doi.org/10.1098/rsta.2000.0588SectionRestricted accessIncorporating linguistic structure into statistical language models Ronald Rosenfeld Ronald Rosenfeld School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Google Scholar Find this author on PubMed Search for more papers by this author Ronald Rosenfeld Ronald Rosenfeld School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Google Scholar Find this author on PubMed Search for more papers by this author Published:15 April 2000https://doi.org/10.1098/rsta.2000.0588AbstractStatistical language models estimate the distribution of natural language for the purpose of improving various language technology applications. Ironically, the most successful models of this type take little advantage of the nature of language. I review the extent to which various aspects of natural language are captured in current models. I then describe a general framework, recently developed at our laboratory, for incorporating arbitrary linguistic structure into a statistical framework, and present a methodology for eliciting linguistic features currently missing from the model. Finally, I ponder our failure heretofore to integrate linguistic theories into a statistical framework, and suggest possible reasons for it. Previous ArticleNext Article VIEW FULL TEXT DOWNLOAD PDF FiguresRelatedReferencesDetailsCited by Toral A, Pecina P, Wang L and van Genabith J (2015) Linguistically-augmented perplexity-based data selection for language models, Computer Speech & Language, 10.1016/j.csl.2014.10.002, 32:1, (11-26), Online publication date: 1-Jul-2015. Devanbu P (2015) New Initiative: The Naturalness of Software 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), 10.1109/ICSE.2015.190, 978-1-4799-1934-5, (543-546) Wang S, Wang S, Cheng L, Greiner R and Schuurmans D (2012) EXPLOITING SYNTACTIC, SEMANTIC, AND LEXICAL REGULARITIES IN LANGUAGE MODELING VIA DIRECTED MARKOV RANDOM FIELDS, Computational Intelligence, 10.1111/j.1467-8640.2012.00436.x, 29:4, (649-679), Online publication date: 1-Nov-2013. Tan M, Zhou W, Zheng L and Wang S (2012) A Scalable Distributed Syntactic, Semantic, and Lexical Language Model, Computational Linguistics, 10.1162/COLI_a_00107, 38:3, (631-671), Online publication date: 1-Sep-2012. Lee R, Jonathan P and Ziman P (2010) Pictish symbols revealed as a written language through application of Shannon entropy, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 466:2121, (2545-2560), Online publication date: 8-Sep-2010. Lee S, Baker J, Song J and Wetherbe J (2010) An Empirical Comparison of Four Text Mining Methods 2010 43rd Hawaii International Conference on System Sciences, 10.1109/HICSS.2010.48, 978-1-4244-5509-6, (1-10) This Issue15 April 2000Volume 358Issue 1769Discussion Meeting Issue ‘Computers, language and speech: formal theories and statistical data’ organized by the Royal Society and the British Academy Article InformationDOI:https://doi.org/10.1098/rsta.2000.0588Published by:Royal SocietyPrint ISSN:1364-503XOnline ISSN:1471-2962History: Published online15/04/2000Published in print15/04/2000 License: Citations and impact KeywordsHuman language technologiesStatistical language modellingFeature induction
- Research Article
23
- 10.3389/frai.2022.777963
- Mar 3, 2022
- Frontiers in Artificial Intelligence
Expectation-based theories of sentence processing posit that processing difficulty is determined by predictability in context. While predictability quantified via surprisal has gained empirical support, this representation-agnostic measure leaves open the question of how to best approximate the human comprehender's latent probability model. This article first describes an incremental left-corner parser that incorporates information about common linguistic abstractions such as syntactic categories, predicate-argument structure, and morphological rules as a computational-level model of sentence processing. The article then evaluates a variety of structural parsers and deep neural language models as cognitive models of sentence processing by comparing the predictive power of their surprisal estimates on self-paced reading, eye-tracking, and fMRI data collected during real-time language processing. The results show that surprisal estimates from the proposed left-corner processing model deliver comparable and often superior fits to self-paced reading and eye-tracking data when compared to those from neural language models trained on much more data. This may suggest that the strong linguistic generalizations made by the proposed processing model may help predict humanlike processing costs that manifest in latency-based measures, even when the amount of training data is limited. Additionally, experiments using Transformer-based language models sharing the same primary architecture and training data show a surprising negative correlation between parameter count and fit to self-paced reading and eye-tracking data. These findings suggest that large-scale neural language models are making weaker generalizations based on patterns of lexical items rather than stronger, more humanlike generalizations based on linguistic structure.
- Research Article
1
- 10.5539/ijel.v5n6p95
- Nov 30, 2015
- International Journal of English Linguistics
<p>Research on second language acquisition (SLA) and use has always been enriched by linguistic schools and theories. The purpose of the present paper is give readers a snapshot of contributions grand linguistic theories have made to L2 acquisition research and pedagogy. The grand linguistic theories chosen for review in the present study include <em>Structural Linguistics</em>, <em>Nativism</em>, <em>Functional Linguistics</em>, and <em>Cognitive Linguistics</em>. These four linguistics theories have been, and some of them are, paid much more focus in the field of linguistics than other theories. In fact, the areas of SLA research and pedagogy have been highly influenced by these four grand linguistic theories. However, their impacts on these two areas have not been equal and, as a matter of fact, some of linguistic theories have more influenced SLA research while other theories have had implications more for SLA pedagogy. The contributions of the aforementioned grand linguistic theories to SLA research and pedagogy are discussed, along with criticisms against the contributions of each linguistic theory posed by the rival researchers.</p>
- Conference Article
2
- 10.1109/worlds4.2019.8903977
- Jul 1, 2019
State-of-the-art neural language models highly rely on fixed-size subword vocabulary based pre-training process to improve the performance of language understanding. Existing subword segmentation algorithms have been researched to generate fixed-size vocabulary by frequency in large text pool without considering the linguistic structures. Also, most of research is focused on widely-used languages such as English and Chinese. However, for Korean, a segmentation algorithm considering the linguistic structure is required, and we thus propose the algorithm considering the characteristics of Korean. For example, Korean words include the special type of suffix called “Josa” that adds grammatical meaning. We also propose subword regularization algorithm based on mutual information, which is interdependence between words. The regularization algorithm customizes the size of vocabulary on demand. In addition, we present an experiment analysis of subword segmentation and interdependent regularization by testing neural language model. It can achieve better performance by small changes of vocabulary.
- Research Article
- 10.1515/ip-2025-2001
- Apr 28, 2025
- Intercultural Pragmatics
Language models are mathematical functions and, as such, induce vector spaces in which input is embedded. Comparing the point clouds of concept vectors across such language models and similar computer vision models, we see surprising similarities. This sheds new light on the Innateness Debate. Much linguistic structure can be induced from extra-linguistic data. Language models are generally thought to be too sample-inefficient to be good models of language acquisition, but what about language models initialized by computer vision models?
- Conference Article
7
- 10.1109/vis47514.2020.00062
- Oct 1, 2020
In this paper we introduce a method for visually analyzing contextualized embeddings produced by deep neural network-based language models. Our approach is inspired by linguistic probes for natural language processing, where tasks are designed to probe language models for linguistic structure, such as parts-of-speech and named entities. These approaches are largely confirmatory, however, only enabling a user to test for information known a priori. In this work, we eschew supervised probing tasks, and advocate for unsupervised probes, coupled with visual exploration techniques, to assess what is learned by language models. Specifically, we cluster contextualized embeddings produced from a large text corpus, and introduce a visualization design based on this clustering and textual structure – cluster co-occurrences, cluster spans, and cluster-word membership– to help elicit the functionality of, and relationship between, individual clusters. User feedback highlights the benefits of our design in discovering different types of linguistic structures.
- Book Chapter
- 10.1007/978-981-10-6496-8_27
- Sep 21, 2017
Deep neural network language model has gained significant development among natural language processing (NLP) in recent years. In this paper, we focused on using neural language model (NNLM) to enhance microblog search. This paper proposed a microblog search method based on neural network language model (NBSM). Firstly, we train neural network language model based on microblog data, so as to get the distributed representation of words which may contain internal express model of microblog. Then, we use the distributed representation of words to get the expanding words of users’ searching words. Finally, we re-rank microblog search results combining deep sematic text similarity and social signal features. The method we proposed can effectively obtain microblog express model, and its search result can reflect the social hot-topics of the topic related to users searching words. Experiment results show that the proposed method yields significant improvements over state-of-arts methods and significantly improves the user’s search experience.
- Conference Article
1
- 10.1109/icecta.2017.8251935
- Nov 1, 2017
Statistical N-grams language models (LMs) have shown to be very effective in natural language processing (NLP), particularly in automatic speech recognition (ASR) and machine translation. In fact, the successful impact of LMs promote to introduce efficient techniques as well as different types models in various linguistic applications. The LMs mainly include two types that are grammars and statistical language models that is also called N-grams. The main difference between grammars and statistical language models is that the statistical language models are based on the estimation of probabilities for words sequences while the grammars usually do not have probabilities. Despite there are many toolkits that are used to create LMs, however, this work employs two well-known language modeling toolkits with focus on the Arabic text. The implementing toolkits include the Carnegie Mellon University (CMU)-Cambridge Language Modeling Toolkit and the Cambridge University Hidden Markov Model Toolkit (HTK) language modeling toolkits. For clarification, we used a small Arabic text corpus to compute the N-grams for 1-gram, 2-gram, and 3-gram. In addition, this paper demonstrates the intermediate steps that are needed to generate the ARPA-format LMs using both toolkits.
- Research Article
3
- 10.4233/uuid:d9a0ae1d-3336-4e43-bc3d-7a3a06461f54
- Mar 11, 2014
Language modeling plays a critical role in natural language processing and understanding. Starting from a general structure, language models are able to learn natural language patterns from rich input data. However, the state-of-the-art language models only take advantage of words themselves, which are not sufficient to characterize the language. In this thesis, we improve recurrent neural network language models (RNNLM) by training them with additional information. Different methods of integrating the different types of additional information into RNNLMs are proposed in this thesis. All the potential information beyond the word itself that can be used to characterize the language is called meta-information. In this thesis, we propose to use different types of meta-information to represent languages such as discourse level information, which is reflected from the whole discourse, sentence level information which characterize the patterns of sentences and morphological information which represents the word from different perspectives. For example, we consider the following Dutch paragraph. represents sentence beginning. stands for the sentence ending. kan allemaal nog natuurlijk maar ze ontlopen dan de groepswinnaar in elk geval in de kwartfinale en vooral Nederland wil graag in Rotterdam die kwartfinale spelen en dan moet er groepswinst behaald worden anders verhuizen ze naar Brugge en krijgt het Jan Breydelstadion Oranje dus op bezoek we gaan er even uit slotfase zit eraan te komen twee minuten nog tot het einde plus de toegevoegde tijd dat is uh toch nog ook wel een paar minuten denk ik maar de wedstrijd is gespeeld On the discourse level, this paragraph is labeled as “Live commentaries (broadcast)” from the socio-situational setting (SSS) perspective and “sport” from the topic perspective. On the sentence level, each word except for the beginning word and ending word , is annotated with its preceding word information and succeeding word information. For example, we consider word “slotfase” in the following sentence. slotfase zit eraan te komen . This word has preceding information “ ” and succeeding information “zit eraan te komen ”. On the word level, the word “slotfase” is annotated by a vector containing some of the proposed meta-information. On the discourse level, we investigate classification methods for socio-situational settings and topics. On the sentence level, in this thesis, we focus on information such as succeeding words information and whole sentence information. In this thesis, each word is annotated by a vector containing the meta-information collected. Different methods are proposed in this thesis to integrate the meta-information into language models. On the discourse level, a curriculum learning method has been used to combine the socio-situational settings and topics. On the sentence level, forward-backward recurrent neural network language models have been proposed to integrate the succeeding word information and whole sentence information into language models. On the word level, each word has been conditioned on its preceding words as well as on preceding meta-information. The results reported in this thesis show that meta-information can be used to improve the effectiveness of language models at the cost of increasing training time. In this thesis, we address this problem by applying parallel processing techniques. A subsampling stochastic gradient descent algorithm has been proposed to accelerate the training of recurrent neural network language models.
- Research Article
44
- 10.1111/epi.17570
- Mar 13, 2023
- Epilepsia
Epilepsy is a neurological disorder characterized by recurrent seizures, which can significantly impact the quality of life of affected individuals. Fortunately, advances in artificial intelligence (AI) are providing new opportunities to improve the diagnosis and treatment of epilepsy. Briefly, examples of ongoing epilepsy-related AI research include (1) algorithms that can analyze large amounts of electroencephalography (EEG) time-series data to label interictal epileptiform discharges both independently and with human supervision,1, 2 (2) diagnostic biomedical imaging with automated magnetic resonance imaging (MRI)–based lesion detection, surgical decision-making support, and outcome prediction,3, 4 and (3) Clinical Decision Support Systems (CDSS) that use patient data to provide physicians with recommendations based on up-to-date evidence and guidelines for an, overall, improved diagnostic and therapeutic accuracy.5, 6 Language models are often used in chatbots and other conversational systems to generate context-aware human-like text in response to an input prompt from a user. Such models are trained on large data sets of human conversations using machine learning (ML) techniques to learn the patterns and structure of natural language. Various artificial intelligence (AI) language models have been developed since the 1950s, but significant advances have only been made in recent years due to improved ML models paired with an increased availability of large amounts of data and computational resources. Some of the earliest examples of such models include ELIZA, developed in the 1960s (one of the first programs to simulate a patient-doctor relationship), and SHRDLU from the 1970s (a program able to emulate dialogue around a simplified world with a limited number of objects, the "blocks world").7, 8 However, these early language models were inherently limited in their capabilities and could perform only a narrow range of tasks. In recent years, more complex, large language models have led to significant progress in natural language processing. Several of these AI language models can be used for dialogue, for example, (1) GPT-3 (Generative Pre-trained Transformer 3), a state-of-the-art language model developed by OpenAI that can generate contextual human-like text for a wide range of applications, including dialogues9; (2) DialoGPT, a language model developed by Microsoft that is trained on a large data set of social media comment chains and can generate responses in single-turn conversations10; (3) Meena, a sensible and specific language model developed by Google that is trained on human–human conversations from public-domain social media and can generate responses that are coherent and contextually appropriate11; and (4) XLNet, a language model developed by Google and Carnegie Mellon University that is capable of several language modeling tasks including question answering, natural language inference, sentiment analysis, and document ranking; and many others.12 Mainly such algorithms enable the analysis of free-text electronic medical records and other written materials (e.g., test results and treatment plans) that are otherwise inaccessible without preprocessing and standardization. By analyzing large amounts of free-text medical records, language models can learn to identify and summarize relevant patterns. Possible outcomes are information on identified hierarchical patient subgroups based on seizure patterns, documented treatment options, and outcome parameters.13-15 This structured information could be queried to provide personalized treatment recommendations based on medical history and other relevant factors. For example, by identifying early candidates for epilepsy surgery, language models can help minimize treatment delays and improve patient outcomes.16, 17 Another example of how language models can improve health care are Clinical Decision Support Systems (CDSS) trained to understand and offer natural responses to queries from health care providers. CDSS can provide medical or surgical treatment recommendations, suggest relevant clinical guidelines or protocols, and alert health care providers to potential errors or risks. Similar methods may be used to create virtual assistants for individuals with epilepsy to answer questions and provide easy access to information about their condition, treatment options, and other related topics, including driving, causes of premature death (including sudden unexpected death in epilepsy [SUDEP]), and status epilepticus.18, 19 Overall, AI language models have the future potential to significantly improve the care and management of individuals with epilepsy by providing natural conversational interfaces to both patients and physicians, allowing for easy access to structured information. We tested ChatGPT (ChatGPT Dec 15 Version, available at chat.openai.com, last accessed 01/07/2023 at 9:30 p.m.) for some of the use cases outlined above and provided the prompts used and model responses in Figure 1. First, we assumed the role of an individual with epilepsy taking levetiracetam. The model correctly responded that aggression is a possible side effect and recommended follow-up with the prescribing physician (Figure 1A).20 We then requested an Acute Seizure Action Plan (ASAP), a structured treatment plan used to guide patients and caregivers in the event of an epileptic seizure. The model provided a reasonable first draft in line with expert recommendations (Figure 1B).21 We found this useful to quickly generate general patient-facing informational content, but note that each ASAP should be subject to human review to screen for misinformation, and to personalize the draft to include additional information from the individual's medical history and seizure types. We proceeded to present the model with a short, simplified case study of an individual with treatment-resistant left mesial temporal lobe epilepsy. Of interest, the model correctly integrated the medical history and diagnostic findings, noting that hippocampal sclerosis presents an epileptogenic lesion before proceeding to recommend epilepsy surgery. Although this assessment represents a simplification of phase I presurgical evaluation findings and surgical strategies, the overall recommendation is sound.22 However, limitations became apparent when we informed the model that the previously discussed patient now had additional evidence of right temporal lobe seizure onset. Although the initial response is still appropriate, the following advice is actively harmful (Figure 1D). The model confidently states that the patient's health care team may consider bilateral temporal lobectomy or removal of both temporal lobes and the adjacent frontal and parietal lobes (a procedure incorrectly defined as "hemispherotomy" by the model). Finally, even simple queries for structured information may fail if it concerns particularly specialized or disputed areas of knowledge. In Figure 1E, we queried if there is a relationship between variants in SCN9A and autosomal dominant epilepsy. The positive response was incorrect, likely due to misinformation in the academic literature present in the model's training data. Any relationship between variants in SCN9A and epilepsy has been refuted.23, 24 Previous research, as outlined above, has focused on language models trained on large amounts of public-domain data of general human conversations, commonly involving text messages from social media sites (Twitter, Reddit, Facebook, etc.) and some additional training data from books or academic literature. Indeed, the use cases shown above do not accurately represent the limits of this tool, as it was likely not trained on a sufficiently extensive, high-quality, domain-specific data set. It is important to note that language models cannot easily deal with disputed areas of knowledge and may not provide correct answers when contradictions are present in the input data. In light of these general considerations and the specific use cases outlined above, we argue that oversight from medical professionals will be needed to distill training information, and that all current AI applications need to be utilized in combination with human expertise. This is made immediately relevant by the fact that the broad ethical and legal implications of generative models are subjects of ongoing debate, with developers denying liability that may then fall onto the clinician user. Another important limitation of language models is an issue coined "hallucination," which describes confidently formulated answers with incorrect or nonsensical content.25 This misinformation is a result of biased training data or mismatches between token encoding and concept representation, and it is particularly difficult to identify. Finally, users should be aware that language models show bias against individuals based on gender, race, or disability.26 This issue is particularly sensitive in epilepsy, where stigma is still prevalent.27 Extraction of structured information from electronic medical records and assistance with simple human-supervised tasks are feasible use-case scenarios. However, these systems will need to be thoroughly tested and rigorously validated before they can be used in clinical care, in line with existing regulations on Software as a Medical Device or AI/ML-Enabled Medical Devices.28 Ultimately, AI language models in epilepsy care will depend on developing robust and reliable systems as per the Ethics Guidelines for Trustworthy Artificial Intelligence,29 driven by community-based data sharing and epilepsy-specific AI research. Outside of the clinical care of patients, several successful applications of language models (e.g., smart data processing, content generation, and sentiment analysis) provide a promising perspective of AI-augmented future clinical practice. To achieve similar success stories with AI language models in epilepsy and general clinical practice, we will need to develop protocols for applying decentralized language learning models (i.e., using federated learning) on distributed identifiable patient data from multiple institutions. These coordinated decentralized language models will take advantage of the collective knowledge and insights of multiple sources, including specialty fields like epilepsy, while protecting patient privacy. We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines. Christian M Boßelmann: Conceptualization, Writing – original draft; Costin Leu: Writing – review & editing; Dennis Lal: Writing – review & editing, Supervision. None. The authors report no conflicts of interest.
- Book Chapter
- 10.1007/978-81-322-2752-6_60
- Jan 1, 2016
Today’s speech recognizers use very little knowledge of what language really is. They treat a sentence as if it would be generated by a random process and pay little or no attention to its linguistic structure. If recognizers knew about the rules of grammar, they would potentially make less recognition errors. Highly linguistically motivated grammars that are able to capture the deeper structure of language have evolved from the natural language processing community during the last few years. However, the speech recognition community mainly applies models which disregard that structure or applies very coarse probabilistic grammars. This paper aims at bridging the gap between statistical language models and elaborate linguistic grammars. Firstly an analysis of the need to integrate the conventional Statistical Language Models with the modern Linguistic Knowledge based language models is made, thereby justifying the Statistical and Linguistic Knowledge based Speech Recognition System which is asymptotically error free.
- Research Article
- 10.6342/ntu.2015.00784
- Jan 1, 2015
The inestimable volumes of multimedia associated with spoken documents that been made available to the public in the past two decades have brought spoken document understanding and organization to the forefront as subjects of research. Among all the related subtasks, spoken document indexing, retrieval and summarization can be thought of as the cornerstones of this research area. Statistical language modeling (LM), which purports to quantify the acceptability of a given piece of text, has long been an interesting yet challenging research area. Much research shows that language modeling for spoken document processing has enjoyed remarkable empirical success. Motivated by the great importance of and interest in language modeling for various spoken document processing tasks (i.e., indexing, retrieval and summarization), language modeling is the backbone of this thesis. In real-world applications, a serious challenge faced by the search engine is that queries usually consist of only a few words to address users’ information needs. This thesis starts with a general survey of the practical challenge, and then not only proposes a principled framework which can unify the relationships among several widely-used approaches but also extends this school of techniques to spoken document summarization tasks. Next, inspired by the concept of the i-vector technique, an i-vector based language modeling framework is proposed for spoken document retrieval and reformulated to accurately represent users’ information needs. Following, we are aware that language models have shown preliminary success in extractive speech summarization, but a central challenge facing the LM approach is how to formulate sentence models and accurately estimate their parameters for each sentence in the spoken document to be summarized. Thus, in this thesis we propose a framework which builds on the notion of recurrent neural network language models and a curriculum learning strategy, which shows promise in capturing not only word usage cues but also long-span structural information about word co-occurrence relationships within spoken documents, thus eliminating the need for the strict bag-of-words assumption made by most existing LM-based methods. Lastly, word embedding has been a recent popular research area due to its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies that investigate its use in extractive text or speech summarization. First of all, this thesis focuses on building novel and efficient ranking models based on general word embedding methods for extractive speech summarization. Next, the thesis proposes a novel probabilistic modeling framework for learning word and sentence representations, which not only inherits the advantages of the original word embedding methods but also boasts a clear and rigorous probabilistic foundation.
- Conference Article
32
- 10.1109/icassp.1993.319223
- Jan 1, 1993
Linguistic structure in the form of a partial-coverage phrase structure grammar is combined with statistical N-gram techniques. The result is a robust statistical grammar which explicitly incorporates linguistic and semantic structure. This approach makes it possible to model carefully those parts of the input that are important for an application and to use robust techniques that provide a full-coverage statistical language model. This approach is being applied to the recognition of air-traffic-control transmissions, and it has already been shown that a simpler hybrid approach is useful.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
- Dissertation
- 10.20868/upm.thesis.58115
- Mar 3, 2020
Contributions to Speech and Language processing towards Automatic Speech Recognizers with Evolving Dictionaries
- Conference Article
- 10.18653/v1/2022.acl-long.455
- Jan 1, 2022
Representation of linguistic phenomena in computational language models is typically assessed against the predictions of existing linguistic theories of these phenomena. Using the notion of polarity as a case study, we show that this is not always the most adequate set-up. We probe polarity via so-called 'negative polarity items' (in particular, English 'any') in two pre-trained Transformer-based models (BERT and GPT-2). We show that -- at least for polarity -- metrics derived from language models are more consistent with data from psycholinguistic experiments than linguistic theory predictions. Establishing this allows us to more adequately evaluate the performance of language models and also to use language models to discover new insights into natural language grammar beyond existing linguistic theories. Overall, our results encourage a closer tie between experiments with human subjects and with language models. We propose methods to enable this closer tie, with language models as part of experimental pipeline, and show this pipeline at work.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.