Speech Tagging Research Articles

In current nuclear power plants (NPPs) a large amount of condition-based data is being generated and stored to assess and monitor component health and performance. The format of this data can be either numeric (e.g., pump vibration data) or textual (e.g., condition report which assess component health). While assessing component health from numeric data can be performed with a large variety of methods, the extraction of information from textual data still remains a challenge. Natural language processing (NLP) methods are starting to be deployed in current NPPs mainly to filter out incident reports (IRs) that are not safety related by employing supervised machine learning methods. However, these methods do not really provide the quantitative information that might be contained in IRs. This paper presents an approach to extract information from textual data (e.g., from IRs, maintenance reports) that is based on NLP data analytics methods coupled with model based system engineer (MBSE) models. NLP methods are employed to perform syntactic and semantic analyses. Syntactic analysis analyzes the grammatical structure of a sentence; such analysis includes: part of speech (POS) tagging (i.e., identification of grammatic elements of each string - e.g., nouns, verbs), named entity recognition (i.e., identification of text entities - e.g., names, dates, events), and relation extraction (e.g., coreference resolution). On the other hand, semantic analysis is designed to analyze the logic structure of a sentence. Through a specific set of rules, our methods can identify whether a sentence contains health information of a component (e.g., degraded performance, anomaly behavior) or the causal relationship between two events (i.e., a cause-effect pair). An innovative element of our approach is that semantic analysis relies on MBSE models to identify links between textual elements. MBSE are diagrams designed to represent system and component dependencies (from both a form and functional point of view). In our approach, MBSE models emulate system engineer knowledge about component/system architecture. This paper presents in detail how the integration of NLP methods and MBSE models is performed. Few analysis examples focusing on centrifugal pumps will be presented.

Read full abstract

Introduction Unstructured text data (UTD) are increasingly found in many databases that were never intended to be used for research, including electronic medical record (EMR) databases. Data quality can impact the usefulness of UTD for research. UTD are typically prepared for analysis (i.e., preprocessed) and analyzed using natural language processing (NLP) techniques. Different NLP methods are used to preprocess UTD and may affect data quality. Objective Our objective was to systematically document current research and practices about NLP preprocessing methods to describe or improve the quality of UTD, including UTD found in EMR databases. Methods A scoping review was undertaken of peer-reviewed studies published between December 2002 and January 2021. Scopus, Web of Science, ProQuest, and EBSCOhost were searched for literature relevant to the study objective. Information extracted from the studies included article characteristics (i.e., year of publication, journal discipline), data characteristics, types of preprocessing methods, and data quality topics. Study data were presented using a narrative synthesis. Results A total of 41 articles were included in the scoping review; over 50% were published between 2016 and 2021. Almost 20% of the articles were published in health science journals. Common preprocessing methods included removal of extraneous text elements such as stop words, punctuation, and numbers, word tokenization, and parts of speech tagging. Data quality topics for articles about EMR data included misspelled words, security (i.e., de-identification), word variability, sources of noise, quality of annotations, and ambiguity of abbreviations. Conclusions Multiple NLP techniques have been proposed to preprocess UTD, with some differences in techniques applied to EMR data. There are similarities in the data quality dimensions used to characterize structured data and UTD. While a few general-purpose measures of data quality that do not require external data; most of these focus on the measurement of noise.

Read full abstract

Speech Tagging Research Articles

Related Topics

Articles published on Speech Tagging

COVID-19 INFODEMIC – UNDERSTANDING CONTENT FEATURES IN DETECTING FAKE NEWS USING A MACHINE LEARNING APPROACH

Product review opinion based on sentiment analysis

A deep learning approach to building a framework for Urdu POS and NER

Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News

An Emotion-Based Rating System for Books Using Sentiment Analysis and Machine Learning in the Cloud

Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets

Intelligent Part of Speech tagger for Hindi

Aspect-Based Sentiment Analysis for Social Multimedia: A Hybrid Computational Framework

Neural Attention Model for Abstractive Text Summarization Using Linguistic Feature Space

An Hybrid Part of Speech Tagger for Setswana Language using a Voting Method

Neural POS tagging of shahmukhi by using contextualized word representations

A Grammatically and Structurally Based Part of Speech (POS) Tagger for Arabic Language

Model Based Approach to Extract Health Information from Textual Data

Analysis of Telkom University News Subjects on Popular Indonesian News Portals Using a Combination of Hidden Markov Model (HMM) and Rule Based Methods

Sentiment analysis on social media tweets using dimensionality reduction and natural language processing

What we talk about when we talk about EEMs: using text mining and topic modeling to understand building energy efficiency measures (1836-RP)

A scoping review of preprocessing methods for unstructured text data to assess data quality

Analyzing Moravian Feelings Using Computational Methods to Ask Questions about Norms and Sentiments in Eighteenth-Century Moravian Lebensläufe

POS Tagger Improvisation with the Addition of Foreign Word Labels on Telkom University News

Advanced Intelligent English Translation Based on Multisensor Data Fusion Optimization

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Speech Tagging Research Articles

Related Topics

Articles published on Speech Tagging

COVID-19 INFODEMIC – UNDERSTANDING CONTENT FEATURES IN DETECTING FAKE NEWS USING A MACHINE LEARNING APPROACH

Product review opinion based on sentiment analysis

A deep learning approach to building a framework for Urdu POS and NER

Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News

An Emotion-Based Rating System for Books Using Sentiment Analysis and Machine Learning in the Cloud

Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets

Intelligent Part of Speech tagger for Hindi

Aspect-Based Sentiment Analysis for Social Multimedia: A Hybrid Computational Framework

Neural Attention Model for Abstractive Text Summarization Using Linguistic Feature Space

An Hybrid Part of Speech Tagger for Setswana Language using a Voting Method

Neural POS tagging of shahmukhi by using contextualized word representations

A Grammatically and Structurally Based Part of Speech (POS) Tagger for Arabic Language

Model Based Approach to Extract Health Information from Textual Data

Analysis of Telkom University News Subjects on Popular Indonesian News Portals Using a Combination of Hidden Markov Model (HMM) and Rule Based Methods

Sentiment analysis on social media tweets using dimensionality reduction and natural language processing

What we talk about when we talk about EEMs: using text mining and topic modeling to understand building energy efficiency measures (1836-RP)

A scoping review of preprocessing methods for unstructured text data to assess data quality

Analyzing Moravian Feelings Using Computational Methods to Ask Questions about Norms and Sentiments in Eighteenth-Century Moravian Lebensläufe

POS Tagger Improvisation with the Addition of Foreign Word Labels on Telkom University News

Advanced Intelligent English Translation Based on Multisensor Data Fusion Optimization