News Corpus Research Articles

Active Learning (AL) is a technique being widely employed to minimize the time and labor costs in the task of annotating data. By querying and extracting the specific instances to train the model, the relevant task’s performance is improved maximally within limited iterations. However, rare work was conducted to fully fuse features from different hierarchies to enhance the effectiveness of active learning. Inspired by the thought of information compensation in many famous deep learning models (such as ResNet, etc.), this work proposes a novel TextCNN-based Two ways Active Learning model (TCTWAL) to extract task-relevant texts. TextCNN takes the advantage of little hyper-parameter tuning and static vectors and achieves excellent results on various natural language processing (NLP) tasks, which are also beneficial to human-computer interaction (HCI) and the AL relevant tasks. In the process of the proposed AL model, the candidate texts are measured from both global and local features by the proposed AL framework TCTWAL depending on the modified TextCNN. Besides, the query strategy is strongly enhanced by maximum normalized log-probability (MNLP), which is sensitive to detecting the longer sentences. Additionally, the selected instances are characterized by general global information and abundant local features simultaneously. To validate the effectiveness of the proposed model, extensive experiments are conducted on three widely used text corpus, and the results are compared with with eight manual designed instance query strategies. The results show that our method outperforms the planned baselines in terms of accuracy, macro precision, macro recall, and macro F1 score. Especially, to the classification results on AG’s News corpus, the improvements of the four indicators after 39 iterations are 40.50%, 45.25%, 48.91%, and 45.25%, respectively.

Read full abstract

Precise logistic support is essential after a disaster occurs. It must be timely, accurate, targeted, and based on existing needs. However, obtaining sufficient and accurate information related to logistic distribution locations remains a key problem. Therefore, implementing Named Entity Recognition (NER) can address this issue. In recent years, news coverage through Indonesian digital news media and social media accounts has emerged as a promising source for building a disaster data corpus. This study implemented NER to extract and identify named entities from text-based information, particularly from Indonesian digital news media. In addition to using regular entities from the NER standard, this study introduced new entities specialized for disaster-related information, including DISASTER, SCALE, SUPPLIES, CASUALTIES, and OUTSIDE. The new disaster corpus in the Indonesian language for the NER model was obtained with an imbalanced dataset composition. To overcome this problem, random oversampling was applied. This study also utilized the BiLSTM model to recognize each entity in new textual information, evaluating its performance when the proposed Indonesian disaster corpus was used as a training reference in the deep learning model. Several optimization algorithms applied in BiLSTM were evaluated. The results showed improved BiLSTM performance using Adam optimization and a balanced corpus. Performance indicators achieved were 93.4 %, 82.4 %, and 87.5 % for precision, recall, and F1-score, respectively. The BiLSTM network captured long-range dependencies in sequential data provided by NER. Oversampling ensured that the proposed NER model could precisely recognize all entities and reduce biased results. Thus, the BiLSTM method can better identify entities in the textual corpus of Indonesian disaster-related online news.

Read full abstract

News Corpus Research Articles

Related Topics

Articles published on News Corpus

Fake news detection and corpus establishment from comment data for social network posts

Uzbek news corpus for named entity recognition

The news values of fake news

Coverage of Political Unrest in Pakistani English Newspapers: A Corpus-Based Content Analysis

A dual-ways feature fusion mechanism enhancing active learning based on TextCNN

Amina: an Arabic multi-purpose integral news articles dataset

Real-Time Extraction of News Events Based on BERT Model

Public Behavior and Emotion Correlation Mining Driven by Aspect From News Corpus.

Forecasting Inflation Using Economic Narratives

Indonesian disaster named entity recognition from multi source information using bidirectional LSTM (BiLSTM)

Shame-Sensitive Public Health.

Corpus-based Multi-dimensional Analysis of the Register Features of Financial News

The Conceptual Metaphor PARENTS ARE ANIMALS: On Linguistic Terms Used Figuratively for Types of Parenting

Aesthetic Plastic Surgery Issues During the COVID-19 Period Using Topic Modeling

Topic Mining and Evolution of U.S. Mainstream Media Reporting on “Belt and Road Initiative” (2013-2023)

Limitations of Large Language Models in Propaganda Detection Task

A study on deep learning for Vietnamese text classification

What if migrants were only people and relatives? Designations used to name people on the move in the Belgian media

From bonus to burden: The cost of ruling from a new(s) perspective

Variasi dan Komponen Makna Verba Pewarta pada Korpus Berita Daring

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

News Corpus Research Articles

Related Topics

Articles published on News Corpus

Fake news detection and corpus establishment from comment data for social network posts

Uzbek news corpus for named entity recognition

The news values of fake news

Coverage of Political Unrest in Pakistani English Newspapers: A Corpus-Based Content Analysis

A dual-ways feature fusion mechanism enhancing active learning based on TextCNN

Amina: an Arabic multi-purpose integral news articles dataset

Real-Time Extraction of News Events Based on BERT Model

Public Behavior and Emotion Correlation Mining Driven by Aspect From News Corpus.

Forecasting Inflation Using Economic Narratives

Indonesian disaster named entity recognition from multi source information using bidirectional LSTM (BiLSTM)

Shame-Sensitive Public Health.

Corpus-based Multi-dimensional Analysis of the Register Features of Financial News

The Conceptual Metaphor PARENTS ARE ANIMALS: On Linguistic Terms Used Figuratively for Types of Parenting

Aesthetic Plastic Surgery Issues During the COVID-19 Period Using Topic Modeling

Topic Mining and Evolution of U.S. Mainstream Media Reporting on “Belt and Road Initiative” (2013-2023)

Limitations of Large Language Models in Propaganda Detection Task

A study on deep learning for Vietnamese text classification

What if migrants were only people and relatives? Designations used to name people on the move in the Belgian media

From bonus to burden: The cost of ruling from a new(s) perspective

Variasi dan Komponen Makna Verba Pewarta pada Korpus Berita Daring