Discovery Logo
Sign In
Search
Paper
Search Paper
R Discovery for Libraries Pricing Sign In
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
Discovery Logo menuClose menu
  • Home iconHome
  • My Feed iconMy Feed
  • Search Papers iconSearch Papers
  • Library iconLibrary
  • Explore iconExplore
  • Ask R Discovery iconAsk R Discovery Star Left icon
  • Literature Review iconLiterature Review NEW
  • Chat PDF iconChat PDF Star Left icon
  • Citation Generator iconCitation Generator
  • Chrome Extension iconChrome Extension
    External link
  • Use on ChatGPT iconUse on ChatGPT
    External link
  • iOS App iconiOS App
    External link
  • Android App iconAndroid App
    External link
  • Contact Us iconContact Us
    External link
  • Paperpal iconPaperpal
    External link
  • Mind the Graph iconMind the Graph
    External link
  • Journal Finder iconJournal Finder
    External link
features
  • Audio Papers iconAudio Papers
  • Paper Translation iconPaper Translation
  • Chrome Extension iconChrome Extension
Content Type
  • Journal Articles iconJournal Articles
  • Conference Papers iconConference Papers
  • Preprints iconPreprints
  • Seminars by Cassyni iconSeminars by Cassyni
More
  • R Discovery for Libraries iconR Discovery for Libraries
  • Research Areas iconResearch Areas
  • Topics iconTopics
  • Resources iconResources

Related Topics

  • Imbalanced Data Classification
  • Imbalanced Data Classification
  • Class Imbalance Problem
  • Class Imbalance Problem
  • Imbalanced Datasets
  • Imbalanced Datasets
  • Class Imbalance
  • Class Imbalance
  • Imbalanced Learning
  • Imbalanced Learning
  • Imbalance Problem
  • Imbalance Problem

Articles published on Imbalanced data

Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
9944 Search results
Sort by
Recency
  • New
  • Research Article
  • 10.1016/j.puhe.2026.106200
Artificial intelligence innovations in substance use prevention on social media: A scoping review.
  • May 1, 2026
  • Public health
  • Van Thanh Nguyen + 4 more

Artificial intelligence innovations in substance use prevention on social media: A scoping review.

  • New
  • Research Article
  • 10.1016/j.jhazmat.2026.141981
From pollutant profiling to source attribution: An interpretable staged machine learning framework for sewer surveillance.
  • May 1, 2026
  • Journal of hazardous materials
  • Jia-Qiang Lv + 11 more

From pollutant profiling to source attribution: An interpretable staged machine learning framework for sewer surveillance.

  • New
  • Research Article
  • 10.1016/j.eswa.2026.131374
AWBIFS: An incremental fusion system for arrhythmia recognition on imbalanced ECG data with adaptive weighting
  • May 1, 2026
  • Expert Systems with Applications
  • Yaqin Zhao + 6 more

AWBIFS: An incremental fusion system for arrhythmia recognition on imbalanced ECG data with adaptive weighting

  • New
  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.inffus.2025.104005
Mitigating class imbalance in forest fire prediction with GAN-Augmented data fusion
  • May 1, 2026
  • Information Fusion
  • Vishal Krishna Singh + 4 more

• This work presents a novel idea in the field of forest fire detection and addresses the critical limitations of existing bias mitigation approaches. • The proposed approach is able to handle the complex interaction of environmental factors and adapts quickly to quickly changing forest fire scenarios. • The proposed approach uses the complex relationships seen between meteorological variables, generative adversarial networks and data fusion to mitigate bias. • The proposed approach addresses comprehensive bias mitigation through the analysis of both high-level and low-level image features, which in turn significantly improve the specificity and accuracy in forest fire detection. Imbalanced data sets exacerbate recognition biases in forest fire prediction models, as disproportionate representation of class instances leads to skewed results. Existing work on bias mitigation has limited ability to generalize and extract features specific to forest fires. Internet of Things (IoT)-based sensor networks can provide real-time, granular data on environmental factors such as temperature, humidity, and soil moisture, helping to capture the dynamic nature of forest conditions and alleviate data imbalance. To address these challenges, this work introduces a novel hybrid approach that explores complex probabilistic relationships among environmental factors, incorporating IoT-driven data, and using a generative adversarial network (GAN) to synthetically augment minority classes. The proposed model is validated on publicly available datasets, and the performance is reported on evaluation metrics such as accuracy, precision, recall, F1-score, computational efficiency and training cost. The results show that the proposed hybrid model is able to achieve a significant improvement over the exiting methods achieving classification accuracy of 95.08%, a precision of 93.03%, a recall of 92.80%, and an F1-score of 92.91%.

  • New
  • Research Article
  • 10.1016/j.future.2025.108315
LLM-APTDS: A high-precision advanced persistent threat detection system for imbalanced data based on large language models with strong interpretabilit
  • May 1, 2026
  • Future Generation Computer Systems
  • Longjing Yang + 4 more

LLM-APTDS: A high-precision advanced persistent threat detection system for imbalanced data based on large language models with strong interpretabilit

  • New
  • Research Article
  • 10.1016/j.asoc.2026.114866
A novel federated adaptive hybrid sampling framework for imbalanced data classification
  • May 1, 2026
  • Applied Soft Computing
  • Zhuo Zhao + 3 more

A novel federated adaptive hybrid sampling framework for imbalanced data classification

  • New
  • Research Article
  • 10.1016/j.iot.2026.101923
Adaptive multi-view transformer ensemble for intrusion detection: Addressing data imbalance and enhancing attack classification
  • May 1, 2026
  • Internet of Things
  • Md Mehedi Hasan + 4 more

Network intrusion detection systems (IDS) face persistent challenges with imbalanced datasets, limited effectiveness against zero-day attacks, and inconsistent performance across diverse attack vectors. This paper presents the Adaptive Multi-View Transformer Ensemble for Intrusion Detection (AMTE-IDS), a comprehensive framework that addresses these limitations through innovative integration of advanced data balancing, multi-perspective feature learning, and dynamic ensemble classification. We introduce a Multi-Modal Wasserstein GAN with Gradient Penalty (MM-WGAN-GP) architecture employing multiple critics with complementary perspectives to generate high-quality synthetic samples for minority attack classes. Our Multi-View Feature Learning module extracts complementary representations of network traffic through specialized transformer-based pathways focusing on global features, temporal patterns, and protocol-specific characteristics. A Dynamic Ensemble Detection module adaptively combines specialized classifiers based on input characteristics, enabling effective detection across diverse attack vectors while maintaining robust performance against evolving threats. Extensive experimentation on NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets demonstrates that AMTE-IDS achieves 97.8% overall accuracy with 73.2% F1-score for minority classes, outperforming state-of-the-art MCGC-IDS by +0.9%/+2.4% respectively (p < 0.001), with 57.1% false positive rate reduction and 0.35ms per-sample inference latency confirming real-time deployment viability. The framework demonstrates strong generalization across different network environments and attack patterns, offering a promising approach for addressing the complex challenges of modern network security.

  • New
  • Research Article
  • 10.1016/j.xphs.2026.104225
Improved sub-visible particle classification in flow imaging microscopy via generative AI-based image synthesis.
  • May 1, 2026
  • Journal of pharmaceutical sciences
  • Utku Ozbulak + 4 more

Sub-visible particle analysis using flow imaging microscopy combined with deep learning has proven effective in identifying particle types, enabling the distinction of harmless components such as silicone oil from protein particles. However, the scarcity of available data and severe imbalance between particle types within datasets remain substantial hurdles when applying multi-class classifiers to such problems, often forcing researchers to rely on less effective methods. The aforementioned issue is particularly challenging for particle types that appear unintentionally and in lower numbers, such as silicone oil and air bubbles, as opposed to protein particles, where obtaining large numbers of images through controlled settings is comparatively straightforward. In this work, we develop a state-of-the-art diffusion model to address data imbalance by generating high-fidelity images that can augment training datasets, enabling the effective training of multi-class deep neural networks. We validate this approach by demonstrating that the generated samples closely resemble real particle images in terms of visual quality and structure. To assess the effectiveness of using diffusion-generated images in training datasets, we conduct large-scale experiments on a validation dataset comprising 500,000 protein particle images and demonstrate that this approach improves classification performance with no observable downside. Finally, to promote open research and reproducibility, we publicly release both our diffusion models and the trained multi-class deep neural network classifiers, along with a straightforward interface for easy integration into future studies, at https://github.com/utkuozbulak/svp-generative-ai.

  • New
  • Research Article
  • 10.30574/wjarr.2026.30.1.0962
Enhancing DDoS Detection in Cloud Computing Environment Through Effective Feature Selection With SMOTE
  • Apr 30, 2026
  • World Journal of Advanced Research and Reviews
  • Ogah Stephen Ugbowu + 2 more

The growing reliance on internet-based services and the increasing sophistication of cyber threats have made network security a crucial concern in modern day computing. These attacks can disrupt operations, result in financial losses, damage reputations, and undermine trust in digital services. Distributed denial of service (DDoS) attacks has emerged as a critical challenge for cloud computing, impacting service availability and raising concerns among providers. Despite cloud computing's scalable and flexible architecture, its vulnerabilities make it an attractive target for attackers. This paper presents a comprehensive survey of DDoS attacks in cloud environments, focusing on detection mechanisms leveraging Synthetic Minority Oversampling Technique (SMOTE). The paper focuses on the analysis of cloud computing characteristics exploited by attackers, and a discussion of effective anomaly detection approaches. Solutions based on SMOTE, encompassing detection parameters, metrics and features were reviewed for their ability to enhance security with high accuracy and low computational costs. The results present 39 different feature selection as depicted in table 2. It recommends that different feature selection and resampling techniques be studied toward developing a faster system for identifying imbalance data for DDoS attack detection.

  • New
  • Research Article
  • 10.1108/ijicc-09-2025-0647
Accelerating classification in large-scale and imbalanced datasets: a hybrid ANN approach
  • Apr 28, 2026
  • International Journal of Intelligent Computing and Cybernetics
  • Özge H Namlı + 5 more

Purpose This study proposes a novel hybrid artificial neural network (H-ANN) framework, inspired by reinforcement learning (RL), to proactively detect Internet connection speed problems using enriched datasets from multiple sources of an Internet service provider. Design/methodology/approach The problem is challenging due to the high dimensionality, unbalanced class distribution and continuous influx of new data. To address these issues, the proposed hybrid framework integrates supervised learning methods – radial basis function network (RBFN) and multi-layer perceptron (MLP) – with the unsupervised self-organizing map (SOM). RL is employed to accelerate learning, reduce feature and instance space complexity and improve the detection of underrepresented classes. The framework is first validated on benchmark open-source datasets and subsequently applied to real-world company databases combining network, business and customer information. Findings The results demonstrate that the proposed H-ANN significantly improves both classification accuracy and computational efficiency compared to conventional machine learning approaches. Importantly, the framework enables the early identification of slow Internet connections before customers submit complaints, allowing the service provider to take proactive measures. Originality/value The proposed H-ANN framework not only enables the early identification of slow Internet connections before customers submit complaints – allowing service providers to take proactive measures – but also offers a generalizable solution for large-scale, imbalanced and dynamic data classification problems across diverse domains.

  • New
  • Research Article
  • 10.21015/vtse.v14i2.2328
Research Trends on Sentiment Analysis and Imbalanced Data Handling in Fake Review Detection: A Systematic Literature Review
  • Apr 26, 2026
  • VFAST Transactions on Software Engineering
  • Leena Ardini Abdul Rahim + 5 more

Fake reviews are deceptive evaluations that mislead customers rather than reflect genuine customer experiences. These reviews can damage the business's reputation by deceiving the customers, which then causes them to make poor decisions about what to buy and diminishes the trust that e-commerce platforms can have. Detecting fake reviews is crucial for e-commerce platforms to maintain their integrity, protect consumers, and uphold business reputations. Despite its importance, there is a paucity of comprehensive research addressing fake review detection through the lenses of Sentiment Analysis (SA) and imbalanced data handling. To bridge this gap, a systematic literature review uses Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. This review analyzed 43 studies from the Scopus and Web of Science databases, covering the period from 2019 to 2024. Three primary themes emerged: SA levels, detection methods, and techniques for handling imbalanced data, which further branched into 28 sub-themes. The analysis revealed key trends such as a predominant focus on document-level SA, the application of machine learning approaches, and data resampling techniques to address imbalanced datasets. The review underscored the necessity for more research on aspect-level analysis and the development of combinational approaches, such as hybrid models, to enhance the accuracy and reliability of fake review detection. These insights provide valuable guidance for researchers, data scientists, and developers seeking to advance the field.

  • New
  • Research Article
  • 10.1007/s13042-026-03095-4
Mst-bézier: an oversampling method for imbalanced data learning based on Bézier curves guided by minimum spanning trees
  • Apr 24, 2026
  • International Journal of Machine Learning and Cybernetics
  • Xinyu Chen + 1 more

Mst-bézier: an oversampling method for imbalanced data learning based on Bézier curves guided by minimum spanning trees

  • New
  • Research Article
  • 10.62643/ijerst.2026.v22.n2(2).2917
Hybrid Transformer-Ensemble Framework for Global Disease Outbreak Recognition and Pattern Modeling
  • Apr 23, 2026
  • International Journal of Engineering Research and Science &amp; Technology
  • V Bharathi + 5 more

The increasing availability of digital healthcare data, particularly clinical text such as electronic health records and medical transcriptions, has created new possibilities for intelligent healthcare systems. These unstructured textual datasets contain valuable information for identifying medical specialties and supporting clinical workflows. However, due to their complex structure, domain-specific terminology, and high dimensionality, extracting meaningful insights remains a significant challenge. The main problem addressed in this work is the automatic classification of clinical text into appropriate medical specialties, which is essential for improving patient care, optimizing resource utilization, and enabling efficient clinical decision-making. Traditional systems rely on manual annotation and rule-based approaches, which are time-consuming, error-prone, and not scalable. These systems fail to capture contextual relationships within the text, leading to reduced performance on large datasets. Furthermore, basic machine learning methods depend on handcrafted features and lack the ability to understand deep semantic meaning, performing poorly on domain-specific and imbalanced healthcare data. To address these challenges, the proposed system introduces transformer-based embeddings with machine learning models. Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) generates contextual embeddings that capture semantic relationships within clinical text. To handle class imbalance, Synthetic Minority Over-sampling Technique (SMOTE) is applied. Multiple models, including Adaptive Boosting Classifier (ABC), Random Forest Classifier (RFC), Tao Tree Classifier (TTC), and Extra Trees Classifier (ETC), are evaluated, with ETC selected as the final model due to its superior performance. The system improves accuracy, scalability, and robustness, supporting efficient healthcare analytics.

  • New
  • Research Article
  • 10.1109/jbhi.2026.3687348
Substructure-guided Deep Graph Learning in Molecular Toxicity Prediction.
  • Apr 23, 2026
  • IEEE journal of biomedical and health informatics
  • Yuting Huang + 1 more

Computational toxicity prediction has become a key component in modern drug discovery. Although machine learning or deep learning techniques have reformed this field in recent years, more in-depth studies on addressing data imbalance, missing labels, and lack of model interpretability are still needed. In this work, we develop a substructure-based deep graph learning architecture, by introducing various functional groups into the construction of molecular graphs and handling them through deep learning models. Our model, with several strategies adopted to deal with the missing labels and class imbalance in the datasets, performed well in toxicity prediction tasks. A functional group-based feature importance analysis provided further insights into different toxicity predictions and improved the interpretability of our model. It provides a solid foundation for the development of reliable toxicity prediction tools and supports rational decision-making in the drug development process.

  • New
  • Research Article
  • 10.55041/isjem.acme060
Enhanced Deep Learning Frameworks for Breast Cancer Detection: Addressing Data Imbalance, Adaptability, and Computational Efficiency
  • Apr 21, 2026
  • International Scientific Journal of Engineering and Management
  • Sireesha B + 4 more

Breast cancer diagnosis from mammogram scans is challenged by the scarcity of balanced datasets and the complexity of medical image interpretation. Existing systems predominantly utilize a deep learning framework combining ResNet50 for feature extraction and the Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance. While these methods achieve high accuracy on balanced datasets (99%) and reasonable results on imbalanced data (90%), they depend heavily on pre-trained models like VGG16 and ResNet50, which limits adaptability to diverse imaging modalities. Moreover, computational demands and synthetic data generation methods such as SMOTE may not fully capture real-world variability, constraining deployment in resource-limited settings and affecting robustness. To address these limitations, the proposed system introduces an enhanced deep learning architecture incorporating domain-specific pretraining, lightweight model design, and advanced data augmentation techniques beyond SMOTE. This framework aims to improve generalization, computational efficiency, and interpretability through novel visualization tools. The benefits include more accurate, reliable, and accessible breast cancer classification across diverse datasets and clinical environments, offering a promising advancement for early detection and diagnostic support in breast cancer care.

  • New
  • Research Article
  • 10.1038/s41598-026-49801-8
Bearing fault diagnosis based on multi-branch enhanced GhostNet with adaptive focal loss.
  • Apr 21, 2026
  • Scientific reports
  • Li Zhang + 4 more

Insufficient fault samples and severe class imbalance present significant challenges for intelligent bearing fault diagnosis. To overcome these issues, this paper proposes a lightweight diagnostic approach that combines an adaptive focal loss function with a multi-branch enhanced ghost network, aiming to improve both accuracy and efficiency under limited and imbalanced data conditions. Specifically, for the small-sample problem, a Lightweight Multi-Branch Enhanced Network (LMBE) is constructed, centered on a novel Multi-Branch Enhanced Ghost Bottleneck (MBEG-bneck) designed to capture high-frequency features and enhance fault pattern recognition. For the class-imbalance problem, a Variable Focusing Class-Balanced Focal Loss (VF-CBFL) is established, which uses a tangent-based dynamic focusing mechanism to adaptively adjust the model's emphasis on hard and easy samples during training, while incorporating a class-balance factor to handle differences in sample numbers across classes. By jointly enhancing feature extraction and loss optimization, the proposed method effectively alleviates the limitations caused by small samples and data imbalance. Experiments conducted on two bearing datasets demonstrate that the proposed method achieves high diagnostic accuracy and strong generalization capability, attaining an accuracy of 98.96% under the extreme imbalance scenario of 10:1 with limited samples.

  • New
  • Research Article
  • 10.1088/1361-6501/ae6295
Zero-shot location of high-speed train bogie faults via non-equilibrium transformer
  • Apr 21, 2026
  • Measurement Science and Technology
  • Yiming Zhang + 5 more

Abstract Data imbalance poses a prevalent and urgent challenge in the safety management of mechanical systems. High-speed trains (HSTs), as a class of large-scale high-end equipments, are particularly vulnerable to the lack of fault data for health management. In this paper, a zero-shot learning Transformer (ZSL-Transformer) localization model is proposed for the identification of bogie faults with specific locations, where the fault data is not involved in the model training process. More detailedly, the data acquisition system is first optimized to comprehensively monitor the operational status of target dampers that require attention, and meanwhile an adaptive residual noise reduction module (ARNRM) is incorporated to mitigate the noise interference in the measured track spectrum. Then, the semantic interpretation of fault location is utilized to construct an attribute description matrix, which serves as auxiliary information for ZSL fault positioning. Subsequently, attribute classifiers are constructed by leveraging the Transformer's capability of modeling global information relationships, thereby establishing the fundamental framework of ZSL. Finally, through comparison with other advanced positioning algorithms, the superior performance of the proposed method is demonstrated, as it attains an average accuracy exceeding 95%.

  • New
  • Research Article
  • 10.54254/2755-2721/2026.gu32927
Template-Guided Prompting for Long-Tail Emotion Recognition
  • Apr 20, 2026
  • Applied and Computational Engineering
  • Yuqin Long

Emotion recognition (ER) poses a complex multi-class classification challenge, further complicated by significant class imbalances. In natural dialogue corpora, dominant emotions like neutral are prevalent, while minority emotions such as disgust and fear are notably scarce. This imbalance results in models consistently underperforming on less frequent categories. This paper investigates template-guided prompting as a method to improve long-tail emotion recognition using large language models (LLMs). We employ a unified evaluation framework on the MELD dataset to compare various methods: supervised baselines (TextCNN, BiLSTM), a fine-tuned pre-trained model (BERT-base), and training-free LLM inference (DeepSeek) using three structured prompt templates in both zero-shot and few-shot scenarios (K=1, 3, 5, 10). Our findings demonstrate that template-guided LLM prompting achieves the highest overall performance (Acc=0.6573, Macro-F1=0.5268) and significantly enhances minority-class F1 scores compared to all supervised baselines, without requiring parameter updates. A detailed analysis of hard-sample errors shows that 16.9% of test instances are misclassified by all five models, with minority emotions having hard-sample rates up to 48%. This bias remains even with balanced downsampling (Pearson r=0.986) and is linked to a systematic prediction bias toward the neutral class. These results imply that the difficulties in long-tail ER arise from intrinsic semantic ambiguity rather than just data imbalance, and that structured prompting offers a practical and effective solution for achieving more balanced emotion recognition.

  • New
  • Research Article
  • 10.33096/ilkom.v18i1.3161.180-194
SMOTE-Based Comparative Analysis of Machine Learning Models for Stroke Risk Prediction Using Imbalanced Healthcare Data
  • Apr 20, 2026
  • ILKOM Jurnal Ilmiah
  • Ratu Mutiara Siregar + 4 more

Stroke remains one of the leading causes of mortality and long-term disability worldwide, with a significant burden in Indonesia. Early detection is crucial, as up to 90% of stroke cases are potentially preventable through timely intervention. However, predictive modeling for stroke risk is often challenged by imbalanced datasets, where non-stroke cases significantly outnumber stroke cases, potentially biasing classification models. This study aims to perform a systematic comparative evaluation of six machine learning algorithms Logistic Regression, Decision Tree, Random Forest, Naïve Bayes, Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) for stroke risk prediction under imbalanced data conditions. The dataset consists of 5,110 patient records with 11 health-related features obtained from a publicly available healthcare dataset. Data preprocessing included anomaly removal, categorical encoding, feature scaling, and class balancing using the Synthetic Minority Oversampling Technique (SMOTE). Model evaluation was conducted using 5-fold cross-validation and assessed through accuracy, precision, recall, and F1-score metrics. The experimental results demonstrate that ensemble-based models outperform single classifiers. Random Forest achieved the highest mean accuracy of 97.12% (±0.42) with an F1-score of 0.96, followed closely by XGBoost with 96.85% (±0.51). Both models also exhibited superior recall performance, indicating improved minority class detection. The novelty of this study lies in the systematic evaluation of multiple machine learning models using SMOTE-based balancing and cross-validation on publicly available healthcare data, providing robust comparative insights for imbalanced medical classification problems.

  • New
  • Research Article
  • 10.1038/s41598-026-49244-1
Data augmentation of event causality identification task with pre-trained language models.
  • Apr 20, 2026
  • Scientific reports
  • Youngjoon Chun + 2 more

Event Causality Identification (ECI) is one of the main tasks in Natural Language Processing (NLP) especially in extracting causal relationships from text. Since identifying causality requires significant time and resources, we applied various data augmentation techniques to enhance data efficiency and the model's classification performance. In this context, preserving causality by maintaining the sentence's structure and context is crucial. Therefore, we propose an augmentation method that leverages the characteristics of Pre-trained Language Models (PLMs) that learn context during the masking process. To compare with PLM-based approaches, we employed various augmentation techniques such as Easy Data Augmentation (EDA), part-of-speech (pos) tagging, noise-based methods and Large Language Models (LLMs). We evaluate performance using Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) as the downstream binary classifier whether the sentence involves the causal relationship. To examine the impact of selection strategy, we compared two approaches: cosine similarity-based Top-1 selection and random selection among augmented candidates. The selected sentences were then used to train the ELECTRA classifier. In addition to standard evaluation, we design an imbalanced data scenario to assess the robustness of the proposed method under low-resource conditions. PLMs showed the highest performance and demonstrated their applicability across various environments by maintaining strong performance even in imbalanced scenarios. Through our experiments, we confirmed that PLM-based data augmentation methods achieve meaningful results in the ECI task and demonstrate the importance of preserving context and structure for predicting causality in sentences.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • .
  • .
  • .
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5

Popular topics

  • Latest Artificial Intelligence papers
  • Latest Nursing papers
  • Latest Psychology Research papers
  • Latest Sociology Research papers
  • Latest Business Research papers
  • Latest Marketing Research papers
  • Latest Social Research papers
  • Latest Education Research papers
  • Latest Accounting Research papers
  • Latest Mental Health papers
  • Latest Economics papers
  • Latest Education Research papers
  • Latest Climate Change Research papers
  • Latest Mathematics Research papers

Most cited papers

  • Most cited Artificial Intelligence papers
  • Most cited Nursing papers
  • Most cited Psychology Research papers
  • Most cited Sociology Research papers
  • Most cited Business Research papers
  • Most cited Marketing Research papers
  • Most cited Social Research papers
  • Most cited Education Research papers
  • Most cited Accounting Research papers
  • Most cited Mental Health papers
  • Most cited Economics papers
  • Most cited Education Research papers
  • Most cited Climate Change Research papers
  • Most cited Mathematics Research papers

Latest papers from journals

  • Scientific Reports latest papers
  • PLOS ONE latest papers
  • Journal of Clinical Oncology latest papers
  • Nature Communications latest papers
  • BMC Geriatrics latest papers
  • Science of The Total Environment latest papers
  • Medical Physics latest papers
  • Cureus latest papers
  • Cancer Research latest papers
  • Chemosphere latest papers
  • International Journal of Advanced Research in Science latest papers
  • Communication and Technology latest papers

Latest papers from institutions

  • Latest research from French National Centre for Scientific Research
  • Latest research from Chinese Academy of Sciences
  • Latest research from Harvard University
  • Latest research from University of Toronto
  • Latest research from University of Michigan
  • Latest research from University College London
  • Latest research from Stanford University
  • Latest research from The University of Tokyo
  • Latest research from Johns Hopkins University
  • Latest research from University of Washington
  • Latest research from University of Oxford
  • Latest research from University of Cambridge

Popular Collections

  • Research on Reduced Inequalities
  • Research on No Poverty
  • Research on Gender Equality
  • Research on Peace Justice & Strong Institutions
  • Research on Affordable & Clean Energy
  • Research on Quality Education
  • Research on Clean Water & Sanitation
  • Research on COVID-19
  • Research on Monkeypox
  • Research on Medical Specialties
  • Research on Climate Justice
Discovery logo
FacebookTwitterLinkedinInstagram

Download the FREE App

  • Play store Link
  • App store Link
  • Scan QR code to download FREE App

    Scan to download FREE App

  • Google PlayApp Store
FacebookTwitterTwitterInstagram
  • Universities & Institutions
  • Publishers
  • R Discovery PrimeNew
  • Ask R Discovery
  • Blog
  • Accessibility
  • Topics
  • Journals
  • Open Access Papers
  • Year-wise Publications
  • Recently published papers
  • Pre prints
  • Questions
  • FAQs
  • Contact us
Lead the way for us

Your insights are needed to transform us into a better research content provider for researchers.

Share your feedback here.

FacebookTwitterLinkedinInstagram
Cactus Communications logo

Copyright 2026 Cactus Communications. All rights reserved.

Privacy PolicyCookies PolicyTerms of UseCareers