NLP Pipeline Research Articles

BackgroundAdolescents and young adults account for over 21% of new HIV infections in the U.S. with most new cases among young men. As an important information source for this group, social media can uniquely reveal the perspectives and communicative patterns of this key population. We identified 6,439 young male Twitter users (ages 13–24) in the U.S. using an NLP pipeline with geolocations. From their Twitter timelines, we collected 24,600 HIV-related tweets, among which the most retweeted and favorited tweets (n = 472) were analyzed through a content analysis.ResultsThree themes arose in this online viral discourse around HIV among young men: (i) othering, (ii) politics and activism, (iii) risk and wellness. Othering tweets contained stigmatizing jokes and insults alienating individuals who identify as lesbian, gay, bisexual, transgender, queer or questioning, intersex, asexual, or being elsewhere on the gender and sexuality spectrum (LGBTQIA +), and people with HIV. Politics and activism tweets discussed awareness, stigma, HIV criminalization, violence, LGBTQIA + , and women’s rights. Risk and wellness tweets discussed risk behaviors for sexually transmitted infections (STIs) (e.g., condomless sex, transactional sex, multiple sexual partners), or safer sex and preventive practices (e.g., pre-exposure prophylaxis [PrEP], condom use, achieving undetectable viral load, medication adherence, and STI testing).ConclusionThe social acceptability of high-risk sex behaviors is high among young male Twitter users. Given the double-edged nature of social media—health-promoting (e.g., awareness, health activism) as well as risk-promoting (e.g., risky behavior endorsement, identity attacks)— this population may benefit from targeted health communication intervention. Future HIV prevention efforts should counter the stigma, misinformation, and risk-promoting viral messages prevalent on social media.

BackgroundNatural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic.ObjectiveThis study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices.MethodsWe conducted iterative test-apply-refinement processes with three involved sites—the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords.ResultsMedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%).ConclusionsHigh performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.

NLP Pipeline Research Articles

Related Topics

Articles published on NLP Pipeline

Annotation of epilepsy clinic letters for natural language processing

Semantic Mapping of Named-Entities in openEHR Templates and Ad-hoc Generation of Compositions.

From virus to viral: content analysis of HIV-related Twitter messages among young men in the U.S.

Unveiling Fall Risk Factors: Nurse-Driven Corpus Development for Natural Language Processing.

ReDWINE: A clinical datamart with text analytical capabilities to facilitate rehabilitation research

LayoutQT—Layout Quadrant Tags to embed visual features for document analysis

Identifying learners' topical interests from social media content to enrich their course preferences in MOOCs using topic modeling and NLP techniques.

A NLP Pipeline for the Automatic Extraction of a Complete Microorganism's Picture from Microbiological Notes.

Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation.

Evaluating Various Tokenizers for Arabic Text Classification

Mapping SNOMED CT Codes to Semi-Structured Texts via an NLP Pipeline.

Connecting the dots in clinical document understanding with Relation Extraction at scale

Upon Improving the Performance of Localized Healthcare Virtual Assistants.

Exploiting co-occurrence networks for classification of implicit inter-relationships in legal texts

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications.

Development and validation of a pragmatic natural language processing approach to identifying falls in older adults in the emergency department

A New Proposal for Evaluating Web Page Cleaning Tools

SEPIR: a semantic and personalised information retrieval tool for the public administration based on distributional semantics

Testing Pre-Annotation to Help Non-Experts Identify Drug-Drug Interactions Mentioned in Drug Product Labeling

TagCurate: crowdsourcing the verification of biomedical annotations to mobile users

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

NLP Pipeline Research Articles

Related Topics

Articles published on NLP Pipeline

Annotation of epilepsy clinic letters for natural language processing

Semantic Mapping of Named-Entities in openEHR Templates and Ad-hoc Generation of Compositions.

From virus to viral: content analysis of HIV-related Twitter messages among young men in the U.S.

Unveiling Fall Risk Factors: Nurse-Driven Corpus Development for Natural Language Processing.

ReDWINE: A clinical datamart with text analytical capabilities to facilitate rehabilitation research

LayoutQT—Layout Quadrant Tags to embed visual features for document analysis

Identifying learners' topical interests from social media content to enrich their course preferences in MOOCs using topic modeling and NLP techniques.

A NLP Pipeline for the Automatic Extraction of a Complete Microorganism's Picture from Microbiological Notes.

Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation.

Evaluating Various Tokenizers for Arabic Text Classification

Mapping SNOMED CT Codes to Semi-Structured Texts via an NLP Pipeline.

Connecting the dots in clinical document understanding with Relation Extraction at scale

Upon Improving the Performance of Localized Healthcare Virtual Assistants.

Exploiting co-occurrence networks for classification of implicit inter-relationships in legal texts

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications.

Development and validation of a pragmatic natural language processing approach to identifying falls in older adults in the emergency department

A New Proposal for Evaluating Web Page Cleaning Tools

SEPIR: a semantic and personalised information retrieval tool for the public administration based on distributional semantics

Testing Pre-Annotation to Help Non-Experts Identify Drug-Drug Interactions Mentioned in Drug Product Labeling

TagCurate: crowdsourcing the verification of biomedical annotations to mobile users