Neural Sequence Models Research Articles

Real-time air pollution monitoring is a valuable tool for public health and environmental surveillance. In recent years, there has been a dramatic increase in air pollution forecasting and monitoring research using artificial neural networks. Most prior work relied on modeling pollutant concentrations collected from ground-based monitors and meteorological data for long-term forecasting of outdoor ozone (O3), oxides of nitrogen, and fine particulate matter (PM2.5). Given that traditional, highly sophisticated air quality monitors are expensive and not universally available, these models cannot adequately serve those not living near pollutant monitoring sites. Furthermore, because prior models were built based on physical measurement data collected from sensors, they may not be suitable for predicting the public health effects of pollution exposure. This study aimed to develop and validate models to nowcast the observed pollution levels using web search data, which are publicly available in near real time from major search engines. We developed novel machine learning-based models using both traditional supervised classification methods and state-of-the-art deep learning methods to detect elevated air pollution levels at the US city level by using generally available meteorological data and aggregate web-based search volume data derived from Google Trends. We validated the performance of these methods by predicting 3 critical air pollutants (O3, nitrogen dioxide, and PM2.5) across 10 major US metropolitan statistical areas in 2017 and 2018. We also explore different variations of the long short-term memory model and propose a novel search term dictionary learner-long short-term memory model to learn sequential patterns across multiple search terms for prediction. The top-performing model was a deep neural sequence model long short-term memory, using meteorological and web search data, and reached an accuracy of 0.82 (F1-score 0.51) for O3, 0.74 (F1-score 0.41) for nitrogen dioxide, and 0.85 (F1-score 0.27) for PM2.5, when used for detecting elevated pollution levels. Compared with using only meteorological data, the proposed method achieved superior accuracy by incorporating web search data. The results show that incorporating web search data with meteorological data improves the nowcasting performance for all 3 pollutants and suggest promising novel applications for tracking global physical phenomena using web search data.

BackgroundThe recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, thus making it expansive and time-consuming. Recent advances in artificial intelligence, specifically the rise of deep learning methods, have enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. However, they require a considerable amount of data to learn from, which is typically their main limiting factor. The Centre for Epidemiology on Medical Causes of Death (CépiDc) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human-coded medical entities available to the machine learning practitioner.ObjectiveThe aim of this paper was to investigate the application of deep neural sequence models to the problem of medical entity recognition from natural language.MethodsThe investigated data set included every French death certificate from 2011 to 2016. These certificates contain information such as the subject’s age, the subject’s gender, and the chain of events leading to his or her death, both in French and encoded as International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) medical entities, for a total of around 3 million observations in the data set. The task of automatically recognizing ICD-10 medical entities from the French natural language–based chain of events leading to death was then formulated as a type of predictive modeling problem known as a sequence-to-sequence modeling problem. A deep neural network–based model, known as the Transformer, was then slightly adapted and fit to the data set. Its performance was then assessed on an external data set and compared to the current state-of-the-art approach. CIs for derived measurements were estimated via bootstrapping.ResultsThe proposed approach resulted in an F-measure value of 0.952 (95% CI 0.946-0.957), which constitutes a significant improvement over the current state-of-the-art approach and its previously reported F-measure value of 0.825 as assessed on a comparable data set. Such an improvement makes possible a whole field of new applications, from nosologist-level automated coding to temporal harmonization of death statistics.ConclusionsThis paper shows that a deep artificial neural network can directly learn from voluminous data sets in order to identify complex relationships between natural language and medical entities, without any explicit prior knowledge. Although not entirely free from mistakes, the derived model constitutes a powerful tool for automated coding of medical entities from medical language with promising potential applications.

Neural Sequence Models Research Articles

Related Topics

Articles published on Neural Sequence Models

Spatial-Temporal Transformer Networks for Traffic Flow Forecasting Using a Pre-Trained Language Model.

Feature Enhanced Spatial–Temporal Trajectory Similarity Computation

N-gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

Dynamic multi-fusion spatio-temporal graph neural network for multivariate time series forecasting

Interpretable Quantum Advantage in Neural Sequence Learning

A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank.

A hybrid deep learning approach for phenotype prediction from clinical notes

Document-Level Event Role Filler Extraction Using Key-Value Memory Network

Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation

Symbolic Brittleness in Sequence Models: On Systematic Generalization in Symbolic Mathematics

Xatu

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment.

Xatu: Richer Neural Network Based Prediction for Video Streaming

MLE-Guided Parameter Search for Task Loss Minimization in Neural Sequence Modeling

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Semantic Service Clustering With Lightweight BERT-Based Service Embedding Using Invocation Sequences

Theoretical Limitations of Self-Attention in Neural Sequence Models

Family History Information Extraction With Neural Attention and an Enhanced Relation-Side Scheme: Algorithm Development and Validation.

Span-Based Neural Buffer: Towards Efficient and Effective Utilization of Long-Distance Context for Neural Sequence Models

System Identification with Time-Aware Neural Sequence Models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Neural Sequence Models Research Articles

Related Topics

Articles published on Neural Sequence Models

Spatial-Temporal Transformer Networks for Traffic Flow Forecasting Using a Pre-Trained Language Model.

Feature Enhanced Spatial–Temporal Trajectory Similarity Computation

N-gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

Dynamic multi-fusion spatio-temporal graph neural network for multivariate time series forecasting

Interpretable Quantum Advantage in Neural Sequence Learning

A comparative study of model-centric and data-centric approaches in the development of cardiovascular disease risk prediction models in the UK Biobank.

A hybrid deep learning approach for phenotype prediction from clinical notes

Document-Level Event Role Filler Extraction Using Key-Value Memory Network

Detecting Elevated Air Pollution Levels by Monitoring Web Search Queries: Algorithm Development and Validation

Symbolic Brittleness in Sequence Models: On Systematic Generalization in Symbolic Mathematics

Xatu

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment.

Xatu: Richer Neural Network Based Prediction for Video Streaming

MLE-Guided Parameter Search for Task Loss Minimization in Neural Sequence Modeling

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

Semantic Service Clustering With Lightweight BERT-Based Service Embedding Using Invocation Sequences

Theoretical Limitations of Self-Attention in Neural Sequence Models

Family History Information Extraction With Neural Attention and an Enhanced Relation-Side Scheme: Algorithm Development and Validation.

Span-Based Neural Buffer: Towards Efficient and Effective Utilization of Long-Distance Context for Neural Sequence Models

System Identification with Time-Aware Neural Sequence Models