The Comprehension of Figurative Language: What Is the Influence of Irony and Sarcasm on NLP Techniques?

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Due to the growing volume of available textual information, there is a great demand for Natural Language Processing (NLP) techniques that can automatically process and manage texts, supporting the information retrieval and communication in core areas of society (e.g. healthcare, business, and science). NLP techniques have to tackle the often ambiguous and linguistic structures that people use in everyday speech. As such, there are many issues that have to be considered, for instance slang, grammatical errors, regional dialects, figurative language , etc. Figurative Language (FL), such as irony , sarcasm , simile, and metaphor, poses a serious challenge to NLP systems. FL is a frequent phenomenon within human communication, occurring both in spoken and written discourse including books, websites, fora, chats, social network posts, news articles and product reviews. Indeed, knowing what people think can help companies, political parties, and other public entities in strategizing and decision-making polices. When people are engaged in an informal conversation, they almost inevitably use irony (or sarcasm) to express something else or different than stated by the literal sentence meaning. Sentiment analysis methods can be easily misled by the presence of words that have a strong polarity but are used sarcastically, which means that the opposite polarity was intended. Several efforts have been recently devoted to detect and tackle FL phenomena in social media. Many of applications rely on task-specific lexicons (e.g. dictionaries, word classifications) or Machine Learning algorithms. Increasingly, numerous companies have begun to leverage automated methods for inferring consumer sentiment from online reviews and other sources. A system capable of interpreting FL would be extremely beneficial to a wide range of practical NLP applications. In this sense, this chapter aims at evaluating how two specific domains of FL, sarcasm and irony, affect Sentiment Analysis (SA) tools. The study’s ultimate goal is to find out if FL hinders the performance (polarity detection) of SA systems due to the presence of ironic context. Our results indicate that computational intelligence approaches are more suitable in presence of irony and sarcasm in Twitter classification.

Similar Papers
  • Research Article
  • 10.52783/jisem.v10i19s.3009
Quantum Computing Base Cybersecurity Mathematical Model Development for Geographically Underdeveloped Areas using Multiple Zonal Approaches using AIML Techniques for Stoppage of Different Types of Attacks
  • Mar 12, 2025
  • Journal of Information Systems Engineering and Management
  • Nandini G.S

Our methodology utilizes a supervised learning approach, employing Random Forest and Gradient Boosting Machines (GBM) trained on a comprehensive dataset that includes email headers, content, and sender behavior. This approach allows our models to discern complex patterns associated with phishing attempts, achieving a 92% detection rate, a substantial improvement over the traditional signature-based methods' 65% rate. Additionally, we integrated NLP techniques, specifically Word2Vec and GloVe, to extract semantic features from email content, enhancing our system's ability to identify malicious intent. The incorporation of NLP not only improves the precision of phishing detection by an additional 15% compared to conventional methods but also emphasizes the importance of semantic analysis in cybersecurity. This enhancement is crucial for understanding the subtle cues within email content that may indicate phishing, offering a more robust and effective defense mechanism for rural areas. By combining supervised learning with quantum computing and NLP, our approach addresses the significant gaps in traditional cybersecurity methods. This multi-layered strategy ensures a more reliable and efficient way to safeguard rural communities from the increasing threat of cyber attacks. The advanced AI techniques employed here leverage both the predictive power of machine learning and the nuanced understanding of language provided by NLP, setting a new standard in cybersecurity practices. The results of our study highlight the effectiveness of the proposed methodology, demonstrating a potential to markedly improve cybersecurity in resource-constrained rural environments. With a 92% phishing detection rate and an increase in precision through the use of NLP, our approach promises a significant advancement in the protection against cyber threats for rural areas, offering a comprehensive and scalable solution. This research presents an innovative multi-layered AI approach, utilizing quantum computing to enhance cybersecurity in rural areas vulnerable to phishing threats. The paper details the integration of sophisticated machine learning techniques—Random Forest and Gradient Boosting Machines (GBM)—with Natural Language Processing (NLP) tools like Word2Vec and GloVe, achieving significant improvements in phishing detection rates. Through a comprehensive analysis of existing cybersecurity strategies and the limitations of traditional signature-based detection methods, this study proposes a robust solution tailored for rural settings such as Siddlagatta, Chikkaballapur, and Devanahalli. By incorporating quantum computing, the approach not only overcomes the constraints of classical computing but also leverages the predictive prowess of AI to offer a more reliable and effective defense against cyber threats. The results demonstrate a promising increase in detection rates, underscoring the potential of this quantum-enhanced, AI-driven strategy to significantly bolster cybersecurity in resource-limited rural environments. Introduction : Cybersecurity in rural areas remains a pivotal concern, exacerbated by limited access to sophisticated technological resources and infrastructure. This paper introduces an advanced multi-layered artificial intelligence (AI) approach, utilizing quantum computing to enhance phishing threat detection in rural environments. Focusing on regions like Siddlagatta, Chikkaballapur, and Devanahalli, the study integrates supervised learning algorithms—Random Forest and Gradient Boosting Machines (GBM)—with Natural Language Processing (NLP) techniques to improve the detection and analysis of phishing attempts. By leveraging machine learning to surpass traditional signature-based methods, this approach significantly boosts detection rates, presenting a tailored, effective solution to protect these vulnerable communities against evolving cyber threats.. Objectives : The objectives of this research are to develop and implement a multi-layered artificial intelligence (AI) approach, utilizing quantum computing to enhance the detection of phishing threats in rural areas. Specifically, the study aims to address the limitations of traditional signature-based detection methods by integrating advanced machine learning algorithms such as Random Forest and Gradient Boosting Machines (GBM) with Natural Language Processing (NLP) techniques. This integration seeks to improve the precision of identifying malicious intent in email communications by analyzing semantic features. The research also explores the effectiveness of these AI techniques in rural settings where cybersecurity resources are scarce, aiming to provide a more robust and efficient solution that can significantly reduce the incidence of phishing attacks in these vulnerable communities. Methods : The proposed methodology entails the development of a web-based platform that melds social networking functionalities with sophisticated agricultural tools and services. By utilizing user profiles, the system effectively categorizes key stakeholders such as farmers, suppliers, experts, and policymakers to foster focused engagement and collaborative efforts. The integration of data from IoT sensors, satellite imagery, and user contributions is channeled into a central system that supports real-time analysis and informed decision-making. Moreover, the platform employs algorithms designed to align stakeholders with pertinent resources, market possibilities, and professional advice. Enhanced communication features like forums, direct messaging, and video conferencing are incorporated to promote interactive exchanges among users. A pilot phase involving select agricultural communities will be initiated to evaluate the practicality and impact of the framework, with subsequent adjustments driven by user feedback and analytic assessments. The ultimate goal of this framework is to boost connectivity, facilitate the efficient distribution of resources, and empower all involved parties through a scalable and intuitive interface. This approach not only aims to revolutionize the way agricultural communities interact and operate but also seeks to provide a robust foundation for continuous growth and innovation in the sector. Results : The simulated results of the study demonstrate a significant enhancement in phishing detection capabilities through the integration of a multi-layered AI approach in rural settings. The deployment of advanced machine learning algorithms, such as Random Forest and Gradient Boosting Machines (GBM), along with Natural Language Processing (NLP) techniques, notably increased the phishing detection rate to 92%, a substantial improvement over the 65% detection rate achieved by traditional signature-based methods. Additionally, the incorporation of NLP through tools like Word2Vec and GloVe improved the precision of identifying malicious intent by an additional 15%, emphasizing the effectiveness of semantic analysis in distinguishing phishing attempts. These results highlight the potential of combining machine learning and quantum computing to address the unique cybersecurity challenges faced in rural areas, providing a robust solution that significantly enhances the detection and prevention of phishing threats.. Conclusions : The research presented in this paper successfully demonstrates the efficacy of a multi-layered AI approach in significantly enhancing cybersecurity against phishing threats in rural areas. By integrating advanced machine learning algorithms with Natural Language Processing techniques and quantum computing, the study achieved a notable increase in phishing detection rates, outperforming traditional signature-based methods with a detection rate of 92%. This approach not only addresses the limitations inherent in existing cybersecurity measures but also tailors its strategy to the unique challenges posed by the limited resources and infrastructure in rural environments. The integration of semantic analysis through NLP further enhanced the precision of threat detection, providing a more nuanced understanding of malicious intent. Overall, the study underscores the potential of sophisticated AI technologies to transform cybersecurity practices in underserved areas, ensuring more effective protection against evolving cyber threats.

  • Supplementary Content
  • 10.2196/72853
The Use of Natural Language Processing to Interpret Unstructured Patient Feedback on Health Services: Scoping Review
  • Aug 14, 2025
  • Journal of Medical Internet Research
  • Ali Feizollah + 5 more

BackgroundUnstructured patient feedback (UPF) allows patients to freely express their experiences without the constraints of predefined questions. The proliferation of online health care rating websites has created a vast source of UPF. Natural language processing (NLP) techniques, particularly sentiment analysis and topic modeling, are increasingly being used to analyze UPF in health care settings; however, the scope and clinical relevance of these technologies are unclear.ObjectiveThis scoping review investigates how NLP techniques are being used to interpret UPF, with a focus on the health care settings in which this is used, the purposes for using these technologies, and any impacts reported on clinical practice.MethodsSearches of the MEDLINE, Embase, CINAHL, Cochrane Database of Reviews, and Google Scholar were conducted in February 2024. No date limits were applied. Eligibility criteria included English-language studies that used NLP techniques on UPF that pertained to an identifiable health care setting or providers. Studies were excluded if human actors solely performed coding or if NLP was applied to structured feedback or non–patient-generated content. Data were extracted and narratively synthesized regarding health care settings, NLP methods, and clinical applications.ResultsFrom 4017 records, 52 studies met inclusion criteria. NLP was most commonly applied to UPF from secondary care settings (n=33) with fewer in primary (n=10) or community (n=5) care. Three NLP techniques were identified in the included studies: sentiment analysis (n=32), topic modeling (n=15), and text classification (n=7). Sentiment analysis was applied to explore associations between patient sentiment and health care provider characteristics, track emotional responses over time, and identify areas for improvement in health care delivery. Topic modeling, primarily using latent Dirichlet allocation algorithm, was used to uncover latent themes in patient feedback, compare patient experiences across different health care settings, and track changes in patient concerns over time. Text classification was used to categorize patient feedback into predefined topics. The association between NLP-derived insights and traditional health care quality metrics was limited, with few studies describing concrete clinical impacts resulting from their analyses.ConclusionsNLP has been applied to UPF across a number of contexts, primarily to identify features of health services or professionals that support good patient experience. The growth of research publications demonstrates an academic interest in these technologies, but there is little evidence these approaches are being used in clinical settings. Future research is required to assess how NLP may capture the nuance of health care interactions, align with existing quality metrics, and how it may be used to influence clinician behavior.

  • Preprint Article
  • 10.2196/preprints.72853
The use of natural language processing to interpret unstructured patient feedback on health services: A scoping review (Preprint)
  • Feb 20, 2025
  • Ali Feizollah + 5 more

BACKGROUND Unstructured patient feedback (UPF) allows patients to freely express their experiences without the constraints of predefined questions. The proliferation of online healthcare rating websites has created a vast source of UPF. Natural language processing (NLP) techniques, particularly sentiment analysis and topic modelling, are increasingly being used to analyse UPF in healthcare settings, however the scope and clinical relevance of these technologies is unclear. OBJECTIVE This scoping review investigates how NLP techniques are being used to interpret UPF, with focus on the healthcare settings in which this is used, the purposes for using these technologies, and any impacts reported on clinical practice. METHODS Searches of the MEDLINE, EMBASE, CINAHL, Cochrane Database of Reviews, and Google Scholar were conducted in February 2024. No date limits were applied. English language studies that used NLP techniques on UPF that pertained to an identifiable health care setting or provider were included. Data extraction focused on the healthcare setting, NLP methods used, and applications of these techniques. RESULTS 52 studies were included. NLP was most commonly applied to UPF from secondary care settings (n=33) with fewer in primary (n=10) or community (n=5) care. Three NLP techniques were identified in the included studies: sentiment analysis (n=32), topic modelling (n=15) and text classification (n=7). Sentiment analysis was applied to explore associations between patient sentiment and healthcare provider characteristics, track emotional responses over time, and identify areas for improvement in healthcare delivery. Topic modelling, primarily using Latent Dirichlet Allocation (LDA) algorithm, was employed to uncover latent themes in patient feedback, compare patient experiences across different healthcare settings, and track changes in patient concerns over time. Text classification was used to categorize patient feedback into predefined topics. The association between NLP-derived insights and traditional healthcare quality metrics was limited, with few studies describing concrete clinical impacts resulting from their analyses. CONCLUSIONS NLP has been applied to UPF across a number of contexts, primarily to identify features of health services or professionals that support good patient experience. The growth of research publications demonstrates an academic interest in these technologies, but there is little evidence these approaches are being employed in clinical settings. Future research is required to assess how NLP may capture the nuance of healthcare interactions, align with existing quality metrics and how it may be used to influence clinician behaviour

  • Book Chapter
  • Cite Count Icon 2
  • 10.1201/9781003132110-7
Natural Language Processing Utilisation in Healthcare
  • Feb 4, 2022
  • S Vani + 3 more

The importance and usage of natural language processing (NLP) have grown a lot in the field of the medical domain for taking various clinical data for several clinical studies and clinical trials. By performing the trails much advancement was developed. Generally, NLP techniques were designed for developing word- and sentence-based searches and getting the best result as per the search criteria, for example, using keywords like disease names, medicine names, side effects of a particular drug or suggesting the drug based on symptoms of a person. Electronic health records (EHR) play a very major role in storing the patient’s medical records from time to time when they visit various doctors. The main advantage of EHR is it can track the history of the health records very easily. Based on the NLP and EHR techniques, general notes and suggestions will be given to the doctor for making the task simpler, and using this keyword search technique provides many advantages such as reducing time for disease identification, helping doctors make the correct decision, affording time for more patients, etc. Even though the NLP technique is performing such numerous things, there are some challenges to using the NLP technique in the medical domain where it needs to improve. For the EHR technique, many technical challenges have to be overcome such as resistance, performance, effectiveness in generating results, etc. Here in this chapter we are presenting a complete survey of NLP with its limitations and also how NLP is showing efficient results in the medical domain.

  • Conference Article
  • Cite Count Icon 24
  • 10.1145/3018896.3036375
A comprehensive investigation of natural language processing techniques and tools to generate automated test cases
  • Mar 22, 2017
  • Imran Ahsan + 3 more

Natural Language Processing (NLP) techniques show promising results to organize and identify desired information from the bulky raw data. As a result, NLP techniques are continuously getting researcher's attention to automate various software development activities like test cases generation. However, selection of right NLP techniques and tools to generate automated test cases is always challenging. Therefore, in this paper, we investigate the application of NLP techniques to generate test cases from preliminary requirements document. A Systematic Literature Review (SLR) has been conducted to identify 16 research works published during 2005-2014. Consequently, 6 NLP techniques and 18 tools have been identified. Furthermore, 4 test case generation approaches and 9 NLP algorithms have also been presented. The identified NLP techniques and tools are highly beneficial for the researchers and practitioners of the domain.

  • Research Article
  • 10.17862/cranfield.rd.10066229.v1
Increasing the accessibility of NLP techniques for Defence and Security using a web-based tool
  • Nov 19, 2019
  • Katie Paxton-Fear

As machine learning becomes more common in defence and security, there is a real risk that the low accessibility of techniques to non-specialists will hinder the process of operationalising the technologies. This poster will present a tool to support a variety of Natural Language Processing (NLP) techniques including the management of corpora – data sets of documents used for NLP tasks, creating and training models, in addition to visualising the output of the models. The aim of this tool is to allow non-specialists to exploit complex NLP techniques to understand the content of large volumes of reports.NLP techniques are the mechanisms by which a machine can process and analyse text written by humans. These methods can used for a range of tasks including categorising documents, translation and summarising text. For many of these tasks the ability to process and analyse large corpora of text is key. With current methods, the ability to manage corpora is rarely considered, instead relying on researchers and practitioners to do this manually in their file system. To train models, researchers use ad-hoc code directly, writing scripts or code and compiling or running them through an interpreter. These approaches can be a challenge when working in multidisciplinary fields, such as defence and security and cyber security. This is even more salient when delivering research where outputs may be operationalised and the accessibility can be a limiting factor in their deployment and use.We present a web interface that uses an asynchronous service-based architecture to enable non-specialists to easily manage multiple large corpora and create and operationalise a variety of different models – at this early stage we have focussed on one NLP technique, that of topic models.This tool-support has been created as part of a project considering the use of NLP to better understand reports of insider threat attacks. These are security incidents where the attacker is a member of staff or another trusted individual. Insider threat attacks are particularly difficult to defend against due to the level of access these individuals gain during the regular course of their employment. The wider use of these techniques would generate greater impact both tactically in defending against these attacks and strategically in developing policy and procedures. There are tools available, however they are often complex and perform a single-task, limiting their use. To generate maximum impact from our research we have developed this web-based software to make the tools more accessible, especially to non-specialist researchers, customers and potential users.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 16
  • 10.1111/lang.12243
Language Learning Research at the Intersection of Experimental, Computational, and Corpus‐Based Approaches
  • Jun 1, 2017
  • Language Learning
  • Patrick Rebuschat + 2 more

Language Learning Research at the Intersection of Experimental, Computational, and Corpus‐Based Approaches

  • Research Article
  • Cite Count Icon 89
  • 10.1109/access.2021.3070606
User Stories and Natural Language Processing: A Systematic Literature Review
  • Jan 1, 2021
  • IEEE Access
  • Indra Kharisma Raharjana + 2 more

<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Context:</i> User stories have been widely accepted as artifacts to capture the user requirements in agile software development. They are short pieces of texts in a semi-structured format that express requirements. Natural language processing (NLP) techniques offer a potential advantage in user story applications. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Objective:</i> Conduct a systematic literature review to capture the current state-of-the-art of NLP research on user stories. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Method:</i> The search strategy is used to obtain relevant papers from SCOPUS, ScienceDirect, IEEE Xplore, ACM Digital Library, SpringerLink, and Google Scholar. Inclusion and exclusion criteria are applied to filter the search results. We also use the forward and backward snowballing techniques to obtain more comprehensive results. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Results:</i> The search results identified 718 papers published between January 2009 to December 2020. After applying the inclusion/exclusion criteria and the snowballing technique, we identified 38 primary studies that discuss NLP techniques in user stories. Most studies used NLP techniques to extract aspects of who, what, and why from user stories. The purpose of NLP studies in user stories is broad, ranging from discovering defects, generating software artifacts, identifying the key abstraction of user stories, and tracing links between model and user stories. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Conclusion:</i> NLP can help system analysts manage user stories. Implementing NLP in user stories has many opportunities and challenges. Considering the exploration of NLP techniques and rigorous evaluation methods is required to obtain quality research. As with NLP research in general, the ability to understand a sentence’s context continues to be a challenge.

  • Research Article
  • Cite Count Icon 29
  • 10.1016/j.ijmedinf.2022.104779
Applications of natural language processing in radiology: A systematic review
  • Apr 26, 2022
  • International journal of medical informatics
  • Nathaniel Linna + 1 more

Applications of natural language processing in radiology: A systematic review

  • Conference Article
  • 10.1109/argencon.2014.6868539
Una comparaci&amp;#x00F3;n de t&amp;#x00E9;cnicas de NLP sem&amp;#x00E1;nticas para analizar casos de uso
  • Jun 1, 2014
  • Alejandro Rago + 2 more

The inspection of documents written in natural language with computers has become feasible thanks to the advances in Natural Language Processing (NLP) techniques. However, certain applications require a deeper semantic analysis of the text to produce good results. In this article, we present an exploratory study of semantic-aware NLP techniques for discovering latent concerns in use case specifications. For this purpose, we propose two NLP techniques, namely: semantic clustering and semantically-enriched rules. After evaluating these two techniques and comparing them with a technique developed by other researchers, results have showed that semantic NLP techniques hold great potential for detecting candidate concerns. Particularly, if these techniques are properly configured, they can help to reduce the efforts of requirement analysts and promote better quality in software development.

  • Research Article
  • Cite Count Icon 112
  • 10.1109/access.2022.3183083
A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques
  • Jan 1, 2022
  • IEEE Access
  • Said Salloum + 3 more

Every year, phishing results in losses of billions of dollars and is a major threat to the Internet economy. Phishing attacks are now most often carried out by email. To better comprehend the existing research trend of phishing email detection, several review studies have been performed. However, it is important to assess this issue from different perspectives. None of the surveys have ever comprehensively studied the use of Natural Language Processing (NLP) techniques for detection of phishing except one that shed light on the use of NLP techniques for classification and training purposes, while exploring a few alternatives. To bridge the gap, this study aims to systematically review and synthesise research on the use of NLP for detecting phishing emails. Based on specific predefined criteria, a total of 100 research articles published between 2006 and 2022 were identified and analysed. We study the key research areas in phishing email detection using NLP, machine learning algorithms used in phishing detection email, text features in phishing emails, datasets and resources that have been used in phishing emails, and the evaluation criteria. The findings include that the main research area in phishing detection studies is feature extraction and selection, followed by methods for classifying and optimizing the detection of phishing emails. Amongst the range of classification algorithms, support vector machines (SVMs) are heavily utilised for detecting phishing emails. The most frequently used NLP techniques are found to be TF-IDF and word embeddings. Furthermore, the most commonly used datasets for benchmarking phishing email detection methods is the Nazario phishing corpus. Also, Python is the most commonly used one for phishing email detection. It is expected that the findings of this paper can be helpful for the scientific community, especially in the field of NLP application in cybersecurity problems. This survey also is unique in the sense that it relates works to their openly available tools and resources. The analysis of the presented works revealed that not much work had been performed on Arabic language phishing emails using NLP techniques. Therefore, many open issues are associated with Arabic phishing email detection.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.12948/issn14531305/20.4.2016.05
From Natural Language Text to Visual Models: A survey of Issues and Approaches
  • Dec 30, 2016
  • Informatica Economica
  • Cristina-Claudia OSMAN + 1 more

1 IntroductionOrganizations focus on automate their processes in order to improve efficiency, reduce costs, and/or reduce human beings' errors in an easy and rapid manner. Business process management (BPM) methods provide a solution to this issue. In this context, information systems like CRM, ERP, SCM, etc. have known an increasing demand. The main problem consists of the length of the business process specifications. If new regulations appear, these specifications must be adapted. Manually extraction of visual models is time consuming. During time, a series of solutions were proposed. The literature shows a crowd of approaches that extracts data models [1], [2], [3], and process models [4], [5] from Natural Language (NL) text. This paper aims to analyze the Natural Language Processing (NLP) techniques and tools used in order to provide different types of visual representations.On the last few years, approaches based on NLP have been developed in order to automate process conversion from NL text. NLP plays an important role in NL text analysis as NLP tries to understand speech and text as humans beings would do. Colloquialism, abbreviations or typos make this task a challenging one. NLP has the origins in 1950s when Alan Turing proposed the Turing test [6], by introducing the imitation game. Since then, the literature shows a plethora of NLP tools [7], [8], [9], [10] using several machine learning techniques, our focus being on those applied on data models and process models discovery from NL text, starting with the first language parser [11] to the actual ones like NLTk [7], [8], ANTLR1, etc.Linguistic analysis is closely tied to NLP. Liddy [12] highlights 7 levels of linguistic analysis: a) Phonetic or Phonological level: how words are pronounced, b) Morphological level: prefixes, suffixes and roots analysis, c) Lexical level: word level analysis including lexical meaning and Part-Of-Speech (POS) analysis, d) Syntactic level: grammatical analysis of words in a sentence, e) Semantic level: determining the possible meanings of sentences, f) Discourse level: interpreting structure and meaning for texts larger than a sentence, g) Pragmatic level: understanding the purpose of a language.Some of the problems approached by NLP are: POS tagging, parsing, Named Entity Recognition (NER), chunking, Semantic Role Labeling (SRL). Anaphora resolution [13] refers to the interpretation of the link between the anaphor and its antecedents.The remainder of the paper is organized as follows: Section 2 briefly outlines the NLP domain, describing the levels of linguistic analysis and the main NLP approaches of analyzing NL requirements. Section 3 focuses on data and process models extraction from NL text. This section makes an introduction to Object Oriented Analysis and Business Process Modeling and analyses the existing tools that discover data and process models from text. Subsequently, Section 4 summarizes the results of this work and the conclusions are drawn in Section 5.2NLPA detailed review on NLP is given in [14] and in [15]. Jones [14] divides the history of NLP into four phases: the first starts at the beginning of the 1940s and lasts to the late 1960s, the second begins from the end of 60s and lasts to the end of 70s, the third is represented by late 80s, where the fourth phase starts in the late of 90s. Next, we will detail each phase as they were defined in [14] and [15]. First phase treated machine translation issues, while the second focused on artificial intelligence. The third phase can be called grammatico-logical phase, which is followed by the lexical phase. A fifth phase is proposed in [16] where formal theories and statistical data are combined. Since this study was published first in 1994 and then it was re-organized in 2001 we can add the sixth phase: from 2000 until present where NLP techniques are combined in order to contribute to visual models extraction.Software requirements are usually written in NL which is asymmetric and irregular [17]. …

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.2196/44191
Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study.
  • Jun 12, 2023
  • JMIR AI
  • Thanai Pongdee + 5 more

Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications. Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR). A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set. The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively. We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s12911-025-02851-w
Natural language processing to identify suicidal ideation and anhedonia in major depressive disorder.
  • Jan 13, 2025
  • BMC medical informatics and decision making
  • L Alexander Vance + 8 more

Anhedonia and suicidal ideation are symptoms of major depressive disorder (MDD) that are not regularly captured in structured scales but may be captured in unstructured clinical notes. Natural language processing (NLP) techniques may be used to extract longitudinal data on suicidal behaviors and anhedonia within unstructured clinical notes. This study assessed the accuracy of using NLP techniques on electronic health records (EHRs) to identify these symptoms among patients with MDD. EHR-derived, de-identified data were used from the NeuroBlu Database (version 23R1), a longitudinal behavioral health real-world database. Mental health clinicians annotated instances of anhedonia and suicidal symptoms in clinical notes creating a ground truth. Interrater reliability (IRR) was calculated using Krippendorff's alpha. A novel transformer architecture-based NLP model was trained on clinical notes to recognize linguistic patterns and contextual cues. Each sentence was categorized into one of four labels: (1) anhedonia; (2) suicidal ideation without intent or plan; (3) suicidal ideation with intent or plan; (4) absence of suicidal ideation or anhedonia. The model was assessed using positive predictive values (PPV), negative predictive values, sensitivity, specificity, F1-score, and AUROC. The model was trained, tested, and validated on 2,198, 1,247, and 1,016 distinct clinical notes, respectively. IRR was 0.80. For anhedonia, suicidal ideation with intent or plan, and suicidal ideation without intent or plan the model achieved a PPV of 0.98, 0.93, and 0.87, an F1-score of 0.98, 0.91, and 0.89 during training and a PPV of 0.99, 0.95, and 0.87 and F1-score of 0.99, 0.95, and 0.89 during validation. NLP techniques can leverage contextual information in EHRs to identify anhedonia and suicidal symptoms in patients with MDD. Integrating structured and unstructured data offers a comprehensive view of MDD's trajectory, helping healthcare providers deliver timely, effective interventions. Addressing current limitations will further enhance NLP models, enabling more accurate extraction of critical clinical features and supporting personalized, proactive mental health care.

  • Book Chapter
  • Cite Count Icon 2
  • 10.4018/979-8-3693-2165-2.ch002
AI Voice Assistant for Smartphones With NLP Techniques
  • Apr 19, 2024
  • Fungai Jacqueline Kiwa + 2 more

The AI voice assistant mobile application was developed to aid drivers in operating their mobile phones while driving without touching their phones. The literature review examines multiple innovative artificial technologies involved in applications with voice assistants in natural language processing (NLP) techniques. The methodology used involved a qualitative approach, and the design science paradigm was used for the development of the voice assistant for smartphones with NLP techniques. NLP techniques that were applied in the development of the AI voice assistant are smart synthesis, data flow sequence, core and interface accessing, part of speech tagging, named entity recognition, conference resolution, and porter stemming. Some of the operations that are achieved by the application include arithmetic calculations based on voice commands and returning the computer result via voice, searching the internet based on user voice input, and providing a response via voice assistance.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon