Predictive factors for inter-agency partnership success.
Predictive factors for inter-agency partnership success.
- Research Article
2
- 10.1093/ehjci/ehaa946.0723
- Nov 1, 2020
- European Heart Journal
Background Syncope is a commonly occurring presenting symptom in emergency departments. While the majority of episodes are benign, syncope is associated with worse prognosis in hypertrophic cardiomyopathy, arrhythmia syndromes, heart failure, aortic stenosis and coronary heart disease. Flagging documented syncope in these patients may be crucial to management decisions. Previous studies show that the International Classification of Diseases (ICD) codes for syncope have a sensitivity of around 0.63, leading to a large number of false negatives if patient identification is based on administrative codes. Thus, in order to provide data-driven, clinical decision support, and to improve identification of patient cohorts for research, better tools are needed. A recent study manually annotated more than 30.000 patient records in order to develop a natural language processing (NLP) tool, which achieved a sensitivity of 92.2%. Since access to medical records and annotation resources is limited, we aimed to investigate whether an unsupervised machine learning and NLP approach with no manual input could achieve similar performance. Methods Our data was admission notes for adult patients admitted between 2005 and 2016 at a large university hospital in Norway. 500 records from patients with, and 500 without a “R55 Syncope” ICD code at discharge were drawn at random. R55 code was considered “ground truth”. Headers containing information about tentative diagnoses were removed from the notes, when present, using regular expressions. The dataset was divided into 70%/15%/15% subsets for training, validation and testing. Baseline identification was calculated by a simple lexical matching using the term “synkope”. We evaluated two linear classifiers, a Support Vector Machine (SVM) and a Linear Regression (LR) model, with a term frequency–inverse document frequency vectorizer, using a bag-of-words approach. In addition, we evaluated a simple convolutional neural network (CNN) consisting of a convolutional layer concatenating filter sizes of 3–5, max pooling and a dropout of 0.5 with randomly initialised word embeddings of 300 dimensions. Results Even a baseline regular expression model achieved a sensitivity of 78% and a specificity of 91% when classifying admission notes as belonging to the syncope class or not. The SVM model and the LR model achieved a sensitivity of 91% and 89%, respectively, and a specificity of 89% and 91%. The CNN model had a sensitivity of 95% and a specificity of 84%. Conclusion With a limited non-English dataset, common NLP and machine learning approaches were able to achieve approximately 90–95% sensitivity for the identification of admission notes related to syncope. Linear classifiers outperformed a CNN model in terms of specificity, as expected in this small dataset. The study demonstrates the feasibility of training document classifiers based on diagnostic codes in order to detect important clinical events. ROC curves for SVM and LR models Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): The Research Council of Norway
- Research Article
- 10.15379/ijmst.v10i2.1828
- Sep 5, 2023
- International Journal of Membrane Science and Technology
This systematic literature review (SLR) examines the current practices, challenges, proposed solutions, and limitations of natural language processing (NLP) and machine learning (ML) approaches in improving requirements specification in software requirements engineering. The review focuses on research conducted in the last five years and includes a selection of papers that discuss the use of NLP and ML techniques for enhancing the accuracy and clarity of requirements, particularly in the context of functional and non-functional requirements. The findings highlight the benefits and challenges associated with the integration of NLP and ML approaches, such as improved classification and identification of requirements. However, it is observed that there is a greater emphasis on non-functional requirements, with a limited representation of research on functional requirements. Comparison of this review and the recent two reviews has been done to observe the differences and highlight the novelty and contribution. The review also identifies limitations, potential bias in assuming that problems related to requirements documentation or specification can be easily resolved through simple changes as well as the need to address the functional requirements. The insights from this SLR contribute to the understanding of the current state of research in this field and provide a foundation for future research directions and practical applications in leveraging NLP and ML approaches to enhance requirements specification in software requirements engineering.
- Research Article
2
- 10.3389/feduc.2023.1240962
- Oct 12, 2023
- Frontiers in Education
The growing body of creativity research involves Artificial Intelligence (AI) and Machine learning (ML) approaches to automatically evaluating creative solutions. However, numerous challenges persist in evaluating the creativity dimensions and the methodologies employed for automatic evaluation. This paper contributes to this research gap with a scoping review that maps the Natural Language Processing (NLP) approaches to computations of different creativity dimensions. The review has two research objectives to cover the scope of automatic creativity evaluation: to identify different computational approaches and techniques in creativity evaluation and, to analyze the automatic evaluation of different creativity dimensions. As a first result, the scoping review provides a categorization of the automatic creativity research in the reviewed papers into three NLP approaches, namely: text similarity, text classification, and text mining. This categorization and further compilation of computational techniques used in these NLP approaches help ameliorate their application scenarios, research gaps, research limitations, and alternative solutions. As a second result, the thorough analysis of the automatic evaluation of different creativity dimensions differentiated the evaluation of 25 different creativity dimensions. Attending similarities in definitions and computations, we characterized seven core creativity dimensions, namely: novelty, value, flexibility, elaboration, fluency, feasibility, and others related to playful aspects of creativity. We hope this scoping review could provide valuable insights for researchers from psychology, education, AI, and others to make evidence-based decisions when developing automated creativity evaluation.
- Research Article
88
- 10.1186/s12911-020-01318-4
- Dec 1, 2020
- BMC Medical Informatics and Decision Making
BackgroundDiabetes mellitus is a prevalent metabolic disease characterized by chronic hyperglycemia. The avalanche of healthcare data is accelerating precision and personalized medicine. Artificial intelligence and algorithm-based approaches are becoming more and more vital to support clinical decision-making. These methods are able to augment health care providers by taking away some of their routine work and enabling them to focus on critical issues. However, few studies have used predictive modeling to uncover associations between comorbidities in ICU patients and diabetes. This study aimed to use Unified Medical Language System (UMLS) resources, involving machine learning and natural language processing (NLP) approaches to predict the risk of mortality.MethodsWe conducted a secondary analysis of Medical Information Mart for Intensive Care III (MIMIC-III) data. Different machine learning modeling and NLP approaches were applied. Domain knowledge in health care is built on the dictionaries created by experts who defined the clinical terminologies such as medications or clinical symptoms. This knowledge is valuable to identify information from text notes that assert a certain disease. Knowledge-guided models can automatically extract knowledge from clinical notes or biomedical literature that contains conceptual entities and relationships among these various concepts. Mortality classification was based on the combination of knowledge-guided features and rules. UMLS entity embedding and convolutional neural network (CNN) with word embeddings were applied. Concept Unique Identifiers (CUIs) with entity embeddings were utilized to build clinical text representations.ResultsThe best configuration of the employed machine learning models yielded a competitive AUC of 0.97. Machine learning models along with NLP of clinical notes are promising to assist health care providers to predict the risk of mortality of critically ill patients.ConclusionUMLS resources and clinical notes are powerful and important tools to predict mortality in diabetic patients in the critical care setting. The knowledge-guided CNN model is effective (AUC = 0.97) for learning hidden features.
- Book Chapter
1
- 10.1007/978-981-19-1076-0_13
- Jan 1, 2022
The significance of integrating Natural Language Processing (NLP) approaches in healthcare research has become more prominent in recent years, and it has had a transformational impact on the state-of-the-art. In healthcare, NLPs are developed as well as assessed on the basis of words, phrases, or record-level explanations based on patient reports such as side-effects of medications, Medicines for illnesses or semantic characteristics are prescribed (nullification, seriousness), etc. While some NLP projects take into account customer expectations at the level of an individual or a group, these projects are still in the minority. A special focus is placed on psychological wellness research, which is currently the subject of little research in healthcare NLP research networks but where NLP approaches are widely used. Although there have been significant advancements in healthcare NLP strategy improvement, we believe that in order for the profession to grow further, more emphasis should be placed on comprehensive evaluation. To help with this, we offer some helpful ideas, including one on a minor etiquette that may be used when announcing clinical NLP strategy improvement and assessment.KeywordsNatural language processingBig DataHealth careSemantic similaritiesElectronic health records (EHRs)ClassificationMental healthKawasaki diseaseHuntsman Cancer InstituteLinguamatics NLP platformGenomicBio-specimenMorphology
- Research Article
16
- 10.1016/j.jadr.2022.100430
- Dec 1, 2022
- Journal of affective disorders reports
Portability of natural language processing methods to detect suicidality from clinical text in US and UK electronic health records.
- Research Article
- 10.1093/jamia/ocaf141
- Sep 22, 2025
- Journal of the American Medical Informatics Association: JAMIA
ObjectiveRule-based structured data algorithms and natural language processing (NLP) approaches applied to unstructured clinical notes have limited accuracy and poor generalizability for identifying immunosuppression. Large language models (LLMs) may effectively identify patients with heterogenous types of immunosuppression from unstructured clinical notes. We compared the performance of LLMs applied to unstructured notes for identifying patients with immunosuppressive conditions or immunosuppressive medication use against 2 baselines: (1) structured data algorithms using diagnosis codes and medication orders and (2) NLP approaches applied to unstructured notes.Materials and MethodsWe used hospital admission notes from a primary cohort of 827 intensive care unit (ICU) patients at Northwestern Memorial Hospital and a validation cohort of 200 ICU patients at Beth Israel Deaconess Medical Center, along with diagnosis codes and medication orders from the primary cohort. We evaluated the performance of structured data algorithms, NLP approaches, and LLMs in identifying 7 immunosuppressive conditions and 6 immunosuppressive medications.ResultsIn the primary cohort, structured data algorithms achieved peak F1 scores ranging from 0.30 to 0.97 for identifying immunosuppressive conditions and medications. NLP approaches achieved peak F1 scores ranging from 0 to 1. GPT-4o outperformed or matched structured data algorithms and NLP approaches across all conditions and medications, with F1 scores ranging from 0.51 to 1. GPT-4o also performed impressively in our validation cohort (F1 = 1 for 8/13 variables).DiscussionLLMs, particularly GPT-4o, outperformed structured data algorithms and NLP approaches in identifying immunosuppressive conditions and medications with robust external validation.ConclusionLLMs can be applied for improved cohort identification for research purposes.
- Book Chapter
6
- 10.1007/978-981-16-1502-3_30
- Jan 1, 2021
The definition of fake news is a cooked-up story with an objective to fool or to cheat people. The current research aims to detect fake news in social media like Twitter, Watsapp and Facebook by studying the responses of the proposed model on posts acquired from Reddit online news store. Automatic fake news detection is a complex activity as it involves the model to implement natural language processing concepts in-tandem with machine learning approaches. Two feature extraction algorithms, namely CountVectoriser (CV) and term frequency-inverse document frequency (TFIDF), were employed separately for extracting the most relevant features from the dataset. These features were fed to multinomial naive Bayes (MNB), random forest (RF), support vector classifier (SVC) and logistic regression (LR) classifiers for classifying fake news creating a total of eight classification models. A solitary CV-based model was considered as the baseline model for predicting fake news in r/theonion and r/nottheonion datasets. GridsearchCV was also implemented for finding the testing and training scores for the selected parameters. Out of these models, TFIDF with MNB achieved an accuracy of 79.05% and is considered as the best.
- Research Article
49
- 10.1111/bjet.12875
- Aug 26, 2019
- British Journal of Educational Technology
In this study, we explore the potential of a natural language processing (NLP) approach to support discourse analysis of in‐situ, small group learning conversations. The theoretical basis of this work derives from Bakhtin’s notion of speech genres as bounded by educational robotics activity. Our goal is to leverage computational linguistics methods to advance and improve educational research methods. We used a parts‐of‐speech (POS) tagging program to automatically parse a transcript of spoken dialogue collected from a small group of middle school students involved in solving a robotics challenge. We grammatically parsed the dialogue at the level of the trigram. Then, through a deliberative process, we mapped the POS trigrams to our theoretically derived problem solving in computational environments coding system. Next, we developed a stacked histogram visualization to identify rich interactional segments in the data. Seven segments of the transcript were thus identified for closer analysis. Our NLP‐based approach partially replicated prior findings. Here, we present the theoretical basis for the work, our analytical approach in exploring this NLP‐based method, and our research findings. Practitioner Notes What is already known about this topic Over the last 10 years, several educational research papers indicate that natural language processing (NLP) techniques can be used to help interpret well‐structured, written dialogue, eg, conversations in online class discussions. Two recent papers indicate that NLP techniques can also be used to help interpret well‐structured, spoken dialogue, eg, replies to interview questions and/or comments made during think aloud protocols. Multimodal learning analytic techniques are being used to investigate collaborative learning. These studies use non‐verbal features of data (gaze, gesture, physical actions), prosodic features of verbal data (pitch and tone) and/or turn‐taking and duration of talk per speaker data, as means of predicting group success. None of the MMLA studies attempt semantic analysis of student talk in collaborative settings. What this paper adds A theoretical framework for why and how an automated NLP approach can support discourse analysis research on co‐located, computer‐based, collaborative problem solving interactions. This framework, entitled the Problem Solving in Computational Environment Speech Genre, links children’s physical interactions with computational devices to their verbal exchanges and presents a theoretical rationale for the use of NLP methods in educational research. Description of an interdisciplinary method that combines NLP techniques with qualitative coding approaches to support analysis of student collaborative learning with educational robotics. Identification of student learning outcomes derived from the semantic, PSCE Speech Genre and NLP approach. Implications for practice and/or policy Educational researchers will be able to expand upon our findings towards the goal of using computation and automation to support microgenetic analysis of large datasets. Robust microgenetic learning findings will provide curriculum developers, educational technology developers and teachers with guidance on how to construct and or create learning materials and environments. From an interdisciplinary perspective, this research can support more interdisciplinary exploration of conversational dialogues that are ill‐structured, indexical and referential. This research will support the further development of machine learning techniques and neural network models by computational linguists.
- Research Article
17
- 10.1016/j.ejca.2020.11.030
- Dec 26, 2020
- European Journal of Cancer
Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients.
- Research Article
15
- 10.1093/jamia/ocaa263
- Nov 17, 2020
- Journal of the American Medical Informatics Association
To apply natural language processing (NLP) techniques to identify individual events and modes of communication between healthcare professionals and families of critically ill patients from electronic medical records (EMR). Retrospective cohort study of 280 randomly selected adult patients admitted to 1 of 15 intensive care units (ICU) in Alberta, Canada from June 19, 2012 to June 11, 2018. Individual events and modes of communication were independently abstracted using NLP and manual chart review (reference standard). Preprocessing techniques and 2 NLP approaches (rule-based and machine learning) were evaluated using sensitivity, specificity, and area under the receiver operating characteristic curves (AUROC). Over 2700 combinations of NLP methods and hyperparameters were evaluated for each mode of communication using a holdout subset. The rule-based approach had the highest AUROC in 65 datasets compared to the machine learning approach in 21 datasets. Both approaches had similar performance in 17 datasets. The rule-based AUROC for the grouped categories of patient documented to have family or friends (0.972, 95% CI 0.934-1.000), visit by family/friend (0.882 95% CI 0.820-0.943) and phone call with family/friend (0.975, 95% CI: 0.952-0.998) were high. We report an automated method to quantify communication between healthcare professionals and family members of adult patients from free-text EMRs. A rule-based NLP approach had better overall operating characteristics than a machine learning approach. NLP can automatically and accurately measure frequency and mode of documented family visitation and communication from unstructured free-text EMRs, to support patient- and family-centered care initiatives.
- Research Article
1
- 10.3390/electronics11152374
- Jul 29, 2022
- Electronics
One of the most impressive applications of the combined use of natural language processing (NLP), classical machine learning, and deep learning (DL) approaches is the estimation of demographic traits from the text. Author Profiling (AP) is the analysis of a text to identify the demographics or characteristics of its author. So far, most researchers in this field have focused on using social media data in the English language. This article aims to expand the predictive potential of demographic traits by focusing on a more diverse dataset and language. Knowing the background of deputies is essential for citizens, political scientists and policymakers. In this study, we present the application of NLP and machine learning (ML) approaches to Turkish parliamentary debates to estimate the demographic traits of the deputies. Seven traits were determined: gender, age, education, occupation, election region, party, and party status. As a first step, a corpus was compiled from Turkish parliamentary debates between 2012 and 2020. Document representations (feature extraction) were performed using various NLP techniques. Then, we created sub-datasets containing the extracted features from the corpus. These sub-datasets were used by different ML classification algorithms. The best classification accuracy rates were more than 31%, 27%, 35%, 41%, 29%, 59%, and 32% according to the majority baseline for gender, age, education, occupation, election region, party, and party status, respectively. The experimental results show that the demographics of deputies can be estimated effectively using NLP, classical ML, and DL approaches.
- Research Article
32
- 10.1161/circinterventions.120.009447
- Oct 1, 2020
- Circulation: Cardiovascular Interventions
Peripheral artery disease (PAD) is underrecognized, undertreated, and understudied: each of these endeavors requires efficient and accurate identification of patients with PAD. Currently, PAD patient identification relies on diagnosis/procedure codes or lists of patients diagnosed or treated by specific providers in specific locations and ways. The goal of this research was to leverage natural language processing to more accurately identify patients with PAD in an electronic health record system compared with a structured data-based approach. The clinical notes from a cohort of 6861 patients in our health system whose PAD status had previously been adjudicated were used to train, test, and validate a natural language processing model using 10-fold cross-validation. The performance of this model was described using the area under the receiver operating characteristic and average precision curves; its performance was quantitatively compared with an administrative data-based least absolute shrinkage and selection operator (LASSO) approach using the DeLong test. The median (SD) of the area under the receiver operating characteristic curve for the natural language processing model was 0.888 (0.009) versus 0.801 (0.017) for the LASSO-based approach alone (DeLong P<0.0001). The median (SD) of the area under the precision curve was 0.909 (0.008) versus 0.816 (0.012) for the structured data-based approach. When sensitivity was set at 90%, the precision for LASSO was 65% and the machine learning approach was 74%, while the specificity for LASSO was 41% and for the machine learning approach was 62%. Using a natural language processing approach in addition to partial cohort preprocessing with a LASSO-based model, we were able to meaningfully improve our ability to identify patients with PAD compared with an approach using structured data alone. This model has potential applications to both interventions targeted at improving patient care as well as efficient, large-scale PAD research. Graphic Abstract: A graphic abstract is available for this article.
- Research Article
160
- 10.1038/s41598-018-25773-2
- May 9, 2018
- Scientific Reports
Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches – a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.
- Research Article
3
- 10.32629/jai.v6i2.623
- Aug 4, 2023
- Journal of Autonomous Intelligence
<p>Nowadays, social media has become a forum for people to express their views on issues such as sexual orientation, legislation, and taxes. Sexual orientation refers to individuals with whom you are attracted and wish to be engaged. In the world, many people are regarded as having different sexual orientations. People categorized as lesbian, gay, bisexual, transgender, queer, and many more (LGBTQ+) have many sexual orientations. Because of the public stigmatization of LGBTQ+ persons, many turn to social media to express themselves, sometimes anonymously. The present study aims to use natural language processing (NLP) and machine learning (ML) approaches to assess the experiences of LGBTQ+ persons. To train the data, the study used lexicon-based sentiment analysis (SA) and six distinct machine classifiers, including logistic regression (LR), support vector machine (SVM), naïve bayes (NB), decision tree (DT), random forest (RF), and gradient boosting (GB). Individuals are positive about LGBTQ concerns, according to the SA results; yet, prejudice and harsh statements against the LGBTQ people persist in many regions where they live, according to the negative sentiment ratings. Furthermore, using LR, SVM, NB, DT, RF, and GB, the ML classifiers attained considerable accuracy values of 97%, 96%, 88%, 100%, 92%, and 91%, respectively. The performance assessment metrics used obtained significant recall and precision values. This study will assist the government, non-governmental organizations, and rights advocacy groups make educated decisions about LGBTQ+ concerns in order to ensure a sustainable future and peaceful coexistence.</p>
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.