Sentence Classification Research Articles

Online Social Network (OSN) is frequently used to carry out cyber-criminal actions such as cyberbullying. As a developing country in Asia that keeps abreast of ICT advancement, Malaysia is no exception when it comes to cyberbullying. Author Identification (AI) task plays a vital role in social media forensic investigation (SMF) to unveil the genuine identity of the offender by analysing the text written in OSN by the candidate culprits. Several challenges in AI dealing with OSN text, including limited text length and informal language full of internet jargon and grammatical errors that further impact AI's performance in SMF. The traditional AI system that analyses long text documents seems inadequate to analyse short OSN text's writing style. N-gram features are proven to efficiently represent the authors' writing style for shot text. However, representing N-grams in traditional representation like Tf-IDF resulted in sparse and difficult in grasping the semantic information from text. Besides, most AI works have been done in English but receive less attention in indigenous languages. In West Malaysia, the supreme languages that transcend ethnic boundaries are Iban of Sarawak and KadazanDusun of Sabah, which both are inherently under-resourced. This paper presented a proposed workflow of AI for short OSN text using two Under-Resourced Language (U-RL), Iban and KadazanDusun tweets, to curb the cyberbullying issue in Malaysia. This paper compares Tf-Idf (sparse) and SoA embedding-based (dense) feature representations to observe which representations best represent the stylistic features of the authors’ writing. N-grams of word, character, and POS were extracted as the features. The representation models were learned by different classifiers using machine learning (Naïve Bayes, Random Forest, and SVM). The convolutional neural network (CNN), a SoA deep learning model in sentence classification, was tested against the traditional classifiers. The result was observed by combining different representation models and classifiers on three datasets (English, Iban, and KadazanDusun). The best result was achieved when CNN learned embedding-based models with a combination of all features. KadazanDusun achieved the highest accuracy with 95.76%, English with 95.02%, and Iban with 94%..

Background Natural language processing (NLP) has become an emerging technology in health care that leverages a large amount of free-text data in electronic health records to improve patient care, support clinical decisions, and facilitate clinical and translational science research. Recently, deep learning has achieved state-of-the-art performance in many clinical NLP tasks. However, training deep learning models often requires large, annotated data sets, which are normally not publicly available and can be time-consuming to build in clinical domains. Working with smaller annotated data sets is typical in clinical NLP; therefore, ensuring that deep learning models perform well is crucial for real-world clinical NLP applications. A widely adopted approach is fine-tuning existing pretrained language models, but these attempts fall short when the training data set contains only a few annotated samples. Few-shot learning (FSL) has recently been investigated to tackle this problem. Siamese neural network (SNN) has been widely used as an FSL approach in computer vision but has not been studied well in NLP. Furthermore, the literature on its applications in clinical domains is scarce. Objective The aim of our study is to propose and evaluate SNN-based approaches for few-shot clinical NLP tasks. Methods We propose 2 SNN-based FSL approaches, including pretrained SNN and SNN with second-order embeddings. We evaluate the proposed approaches on the clinical sentence classification task. We experiment with 3 few-shot settings, including 4-shot, 8-shot, and 16-shot learning. The clinical NLP task is benchmarked using the following 4 pretrained language models: bidirectional encoder representations from transformers (BERT), BERT for biomedical text mining (BioBERT), BioBERT trained on clinical notes (BioClinicalBERT), and generative pretrained transformer 2 (GPT-2). We also present a performance comparison between SNN-based approaches and the prompt-based GPT-2 approach. Results In 4-shot sentence classification tasks, GPT-2 had the highest precision (0.63), but its recall (0.38) and F score (0.42) were lower than those of BioBERT-based pretrained SNN (0.45 and 0.46, respectively). In both 8-shot and 16-shot settings, SNN-based approaches outperformed GPT-2 in all 3 metrics of precision, recall, and F score. Conclusions The experimental results verified the effectiveness of the proposed SNN approaches for few-shot clinical NLP tasks.

Sentence Classification Research Articles

Related Topics

Articles published on Sentence Classification

TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

Biomedical Abstract Sentence Classification by BERT-Based Reading Comprehension

News text classification using Long-Term Short Memory (LSTM) algorithm

An unsupervised linguistic-based model for automatic glossary term extraction from a single PDF textbook

Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study

QNLP in Practice: Running Compositional Models of Meaning on a Quantum Computer

SIMPLE SENTENCE IN A·WE

Synwmd: Syntax-aware word Mover’s distance for sentence similarity evaluation

A Neuro Symbolic Approach for Contradiction Detection in Persian Text

Joint Syntax-Enhanced and Topic-Driven Graph Networks for Emotion Recognition in Multi-Speaker Conversations

Art of Use of Exclusive Language in the Epic of Khing Ju of the Ede People in the Central Highlands, Vietnam

A deep penetration network for sentence classification

Grammar-aware sentence classification on quantum computers

An Approach to Summarizing Product Reviews

Automatic recognition and classification of future work sentences from academic articles in a specific domain

Monitoring Indonesian online news for COVID-19 event detection using deep learning

Label informed hierarchical transformers for sequential sentence classification in scientific abstracts

Exploiting All Samples in Low-Resource Sentence Classification: Early Stopping and Initialization Parameters

Classification of the elliptical sentences in the question-answer dialogue unity in modern English language communication

Verbless predicative structures in Persian paroemias

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sentence Classification Research Articles

Related Topics

Articles published on Sentence Classification

TOWARDS CURBING CYBER-BULLYING IN MALAYSIA BY AUTHOR IDENTIFICATION OF IBAN AND KADAZANDUSUN OSN TEXT USING DEEP LEARNING

Biomedical Abstract Sentence Classification by BERT-Based Reading Comprehension

News text classification using Long-Term Short Memory (LSTM) algorithm

An unsupervised linguistic-based model for automatic glossary term extraction from a single PDF textbook

Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study

QNLP in Practice: Running Compositional Models of Meaning on a Quantum Computer

SIMPLE SENTENCE IN A·WE

Synwmd: Syntax-aware word Mover’s distance for sentence similarity evaluation

A Neuro Symbolic Approach for Contradiction Detection in Persian Text

Joint Syntax-Enhanced and Topic-Driven Graph Networks for Emotion Recognition in Multi-Speaker Conversations

Art of Use of Exclusive Language in the Epic of Khing Ju of the Ede People in the Central Highlands, Vietnam

A deep penetration network for sentence classification

Grammar-aware sentence classification on quantum computers

An Approach to Summarizing Product Reviews

Automatic recognition and classification of future work sentences from academic articles in a specific domain

Monitoring Indonesian online news for COVID-19 event detection using deep learning

Label informed hierarchical transformers for sequential sentence classification in scientific abstracts

Exploiting All Samples in Low-Resource Sentence Classification: Early Stopping and Initialization Parameters

Classification of the elliptical sentences in the question-answer dialogue unity in modern English language communication

Verbless predicative structures in Persian paroemias