Performance Of Classifier Research Articles

Addressing the challenge of toxic language in online discussions is crucial for the development of effective toxicity detection models. This pioneering work focuses on addressing imbalanced datasets in toxicity detection by introducing a novel approach to augment toxic language data. We create a balanced dataset by instructing fine-tuning of Large Language Models (LLMs) using Reinforcement Learning with Human Feedback (RLHF). Recognizing the challenges in collecting sufficient toxic samples from social media platforms for building a balanced dataset, our methodology involves sentence-level text data augmentation through paraphrasing existing samples using optimized generative LLMs. Leveraging generative LLM, we utilize the Proximal Policy Optimizer (PPO) as the RL algorithm to fine-tune the model further and align it with human feedback. In other words, we start by fine-tuning a LLM using an instruction dataset, specifically tailored for the task of paraphrasing while maintaining semantic consistency. Next, we apply PPO and a reward function, to further fine-tune (optimize) the instruction-tuned LLM. This RL process guides the model in generating toxic responses. We utilize the Google Perspective API as a toxicity evaluator to assess generated responses and assign rewards/penalties accordingly. This approach guides LLMs through PPO and the reward function, transforming minority class samples into augmented versions. The primary goal of our methodology is to create a balanced and diverse dataset to enhance the accuracy and performance of classifiers in identifying instances from the minority class. Utilizing two publicly available toxic datasets, we compared various techniques with our proposed method for generating toxic samples, demonstrating that our approach outperforms all others in producing a higher number of toxic samples. Starting with an initial 16,225 toxic prompts, our method successfully generated 122,951 toxic samples with a toxicity score exceeding 30%. Subsequently, we developed various classifiers using the generated balanced datasets and applied a cost-sensitive learning approach to the original imbalanced dataset. The findings highlight the superior performance of classifiers trained on data generated using our proposed method. These results highlight the importance of employing RL and a data-agnostic model as a reward mechanism for augmenting toxic data, thereby enhancing the robustness of toxicity detection models.

Abstract Background Heart rate (HR) tracking by wrist-worn devices using photoplethysmography (PPG) could assist in continuously following up physical activity. However, the accuracy can be impacted by (motion) artefacts. Machine learning models could help to recognise artefacts in PPG-based HR data. The choice of classifier in these machine learning models is a determing factor for task performance of the model. Purpose This study evaluates and determines the optimal classifier for a new machine learning-based approach to enhance the reliability of artefact detection in PPG-based HR data. Methods A total of 62 participants (27 cardiac rehabilitation patients, 35 healthy athletes) wore both a test device and a reference device measuring HR continuously for 24 hours. A training dataset was prepared, assigning two independent labels (i.e. anomaly and activity) to each HR episode based on the reference device data. Fitbit data were processed using our in-house designed artefact removal procedure, which involves the application of two classification models: one for anomaly detection and another for activity detection. Four distinct classifiers were employed for both models: Balanced Bagging, Balanced Bagging with Random Forest, Balanced Random Forest, and Logistic Regression. Each classifier was evaluated using area under the receiver operating characteristic curve (ROC-AUC), accuracy, sensitivity and specificity. Results Of the 1,647,328 HR data points collected, 103,095 (6.26%) were identified as artefacts. Figure 1 and Figure 2 summarise the performance of the distinct classifiers for the anomaly model and the activity model, respectively. Balanced Bagging and Balanced Bagging with Random Forest consistently demonstrate the highest AUC values and accuracies across both anomaly and activity detection models (anomaly detection: AUC = 0.95, accuracy = 89-85%; activity detection: AUC = 0.98, accuracy = 95%). Comparing these two, Balanced Bagging with Random Forest emerges as the preferred option, given the highest sensitivity in both anomaly detection (93%&gt;86%) and activity detection models (99%&gt;96%). In contrast, Balanced Random Forest and Logistic Regression exhibit inferior performance. In the anomaly detection model, Balanced Random Forest exhibits a lower sensitivity of 75%, while Logistic Regression performs even worse with a sensitivity of 25%. Similarly, in the activity detection model, both Balanced Random Forest and Logistic Regression demonstrate diminished performance. Conclusions Balanced Bagging with Random Forest emerges as the optimal classifier to detect anomalies and activities in continuous PPG-based HR data, thus contributing to the optimisation of our in-house designed procedure for removing artefacts. This processing aims to provide a reliable and automatic way for continuous HR monitoring, which can help monitor and guide physical activities.

Performance Of Classifier Research Articles

Related Topics

Articles published on Performance Of Classifier

AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection

Sentiment classification for insider threat identification using metaheuristic optimized machine learning classifiers

Optimisation of artefact detection in photoplethysmography heart rate data: influence of different classifiers in machine learning models

A Strategy for Predicting the Performance of Supervised and Unsupervised Tabular Data Classifiers

Automatic Negation Detection for Semantic Analysis in Arabic Hotel Reviews Through Lexical and Structural Features: A Supervised Classification

Enhancing Activity Recognition After Stroke: Generative Adversarial Networks for Kinematic Data Augmentation

Integrating standard epilepsy protocol, ASL-perfusion, MP2RAGE/EDGE and the MELD-FCD classifier in the detection of subtle epileptogenic lesions: a 3 Tesla MRI pilot study.

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.

G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies

Exploring the role of project status information in effective code smell detection

Assessing the influence of latency variability on EEG classifiers - a case study of face repetition priming

Semi-supervised comparative learning compensation method for chemical gas sensor drift.

AI-DPAPT: a machine learning framework for predicting PROTAC activity.

Large-scale structural covariance networks changes relate to executive function deficit in betel quid-dependent chewers.

SWL-LSE: A Dataset of Health-Related Signs in Spanish Sign Language with an ISLR Baseline Method

Improving acoustic species identification using data augmentation within a deep learning framework

Leveraging Classifier Performance Using Heuristic Optimization for Detecting Cardiovascular Disease from PPG Signals.

Enhancing Intent Classifier Training with Large Language Model-generated Data

CGAN Facilitated Data Augmentation of Voice and Speech Parameters for Detecting Parkinson’s Disease in the Prodromal Phase

Habitat-based CT radiomics enhances the ability to predict spread through air spaces in stage T1 invasive lung adenocarcinoma.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Performance Of Classifier Research Articles

Related Topics

Articles published on Performance Of Classifier

AugmenToxic: Leveraging Reinforcement Learning to Optimize LLM Instruction Fine-Tuning for Data Augmentation to Enhance Toxicity Detection

Sentiment classification for insider threat identification using metaheuristic optimized machine learning classifiers

Optimisation of artefact detection in photoplethysmography heart rate data: influence of different classifiers in machine learning models

A Strategy for Predicting the Performance of Supervised and Unsupervised Tabular Data Classifiers

Automatic Negation Detection for Semantic Analysis in Arabic Hotel Reviews Through Lexical and Structural Features: A Supervised Classification

Enhancing Activity Recognition After Stroke: Generative Adversarial Networks for Kinematic Data Augmentation

Integrating standard epilepsy protocol, ASL-perfusion, MP2RAGE/EDGE and the MELD-FCD classifier in the detection of subtle epileptogenic lesions: a 3 Tesla MRI pilot study.

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.

G4 & the balanced metric family – a novel approach to solving binary classification problems in medical device validation & verification studies

Exploring the role of project status information in effective code smell detection

Assessing the influence of latency variability on EEG classifiers - a case study of face repetition priming

Semi-supervised comparative learning compensation method for chemical gas sensor drift.

AI-DPAPT: a machine learning framework for predicting PROTAC activity.

Large-scale structural covariance networks changes relate to executive function deficit in betel quid-dependent chewers.

SWL-LSE: A Dataset of Health-Related Signs in Spanish Sign Language with an ISLR Baseline Method

Improving acoustic species identification using data augmentation within a deep learning framework

Leveraging Classifier Performance Using Heuristic Optimization for Detecting Cardiovascular Disease from PPG Signals.

Enhancing Intent Classifier Training with Large Language Model-generated Data

CGAN Facilitated Data Augmentation of Voice and Speech Parameters for Detecting Parkinson’s Disease in the Prodromal Phase

Habitat-based CT radiomics enhances the ability to predict spread through air spaces in stage T1 invasive lung adenocarcinoma.