Emotion Classification on Software Engineering Q&A Websites

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Background: With the rapid proliferation of question-and-answer websites for software developers like Stack Overflow, there is an increasing need to discern developers’ emotions from their posts to assess the influence of these emotions on their productivity such as efficiency in bug fixing. Aim: We aimed to develop a reliable emotion classification tool capable of accurately categorizing emotions in Software Engineering (SE) websites using data augmentation techniques to address the data scarcity problem because previous research has shown that tools trained on other domains can perform poorly when applied to SE domain directly. Method: We utilized four machine learning techniques, namely BERT, CodeBERT, RFC (Random Forest Classifier), and LSTM. Taking an innovative approach to dataset augmentation, we employed word substitution, back translation, and easy data augmentation methods. Using these we developed sixteen unique emotion classification models: textit EmoClassBERT-Original, textit EmoClassRFC-Original, textit EmoClassLSTM-Original, textit EmoClassCodeBERT-Original textit EmoClassLSTM-Substitution, textit EmoClassBERT-Substitution, textit EmoClassRFC-Substitution, textit EmoClassCodeBERT-Substitution, textit EmoClassBERT-Translation, textit EmoClassLSTM-Translation, textit EmoClassRFC-Translation, textit EmoClassCodeBERT-Translation, textit EmoClassBERT-EDA, textit EmoClassLSTM-EDA, textit EmoClassCodeBERT-EDA, and textit EmoClassRFC-EDA. We compared the performance of this model on a gold standard state-of-the-art database and techniques (Multi-label SO BERT and EmoTxt). Results: An initial investigation of models trained on the augmented datasets demonstrated superior performance to those trained on the original dataset. EmoClassLSTM-Substitution, EmoClassBERT-Substitution, EmoClassCodeBERT-Substitution, and EmoClassRFC-Substitution models show improvements of 13%, 5%, 5%, and 10% as compared to EmoClassLSTM-Original, EmoClassBERT-Original, EmoClassCodeBERT-Original, and EmoClassRFC-Original, respectively, in average F1 score. The textit EmoClassCodeBERT-Substitution performed the best and outperformed the Multi-label SO BERT and Emotxt by 2.37% and 21.17%, respectively, in average F1-score. A detailed investigation of the models on 100 runs of the dataset shows that BERT-based and CodeBERT-based models gave the best performance. This detailed investigation reveals no significant differences in the performance of models trained on augmented datasets and the original dataset on multiple runs of the dataset. Conclusion: This research not only underlines the strengths and weaknesses of each architecture but also highlights the pivotal role of data augmentation in refining model performance, especially in the software engineering domain.

Similar Papers
  • Research Article
  • Cite Count Icon 80
  • 10.1016/j.cageo.2021.104855
A comparative study of machine learning and Fuzzy-AHP technique to groundwater potential mapping in the data-scarce region
  • Jun 10, 2021
  • Computers & Geosciences
  • Ranveer Kumar + 2 more

A comparative study of machine learning and Fuzzy-AHP technique to groundwater potential mapping in the data-scarce region

  • Research Article
  • Cite Count Icon 22
  • 10.1093/gji/ggab345
Data augmentation and its application in distributed acoustic sensing data denoising
  • Aug 24, 2021
  • Geophysical Journal International
  • Y X Zhao + 2 more

SUMMARY As a data-driven approach, the performance of deep learning models depends largely on the quantity and quality of the training data sets, which greatly limits the application of deep learning to tasks with small data sets. Unfortunately, sometimes we need to use limited small data sets to complete our tasks, such as distributed acoustic sensing (DAS) data denoising. However, using a small data set to train the network may cause overfitting, resulting in poor network generalization. To solve this problem, we propose an approach based on the combination of a generative adversarial network and a deep convolutional neural network. First, we used a small noise data set to train a generative adversarial network to generate synthetic noise samples, and then used these synthetic noise samples to augment the noise data set. Next, we used the augmented noise data set and the signal data set obtained through forward modelling to construct a synthetic training set. Finally, a denoising network based on a convolutional neural network was trained on the constructed synthetic training set. Experimental results show that the augmented data set can effectively improve the denoising performance and generalization ability of the network, and the denoising network trained on the augmented data set can more effectively reduce various kinds of noise in the DAS data.

  • Research Article
  • Cite Count Icon 9
  • 10.1109/embc48229.2022.9871654
Effective Data Augmentation, Filters, and Automation Techniques for Automatic 12-Lead ECG Classification Using Deep Residual Neural Networks.
  • Jul 11, 2022
  • Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
  • Junmo An + 2 more

Automatic electrocardiogram (ECG) analysis plays a critical role in early detection and diagnosis of cardiac abnormalities and diseases. Data augmentation and automation strategies have been proposed to enhance the robustness of the machine and deep learning model for the classification of cardiac abnormalities. Here we propose 15 data augmentation and 6 filters, and an automation method using an end-to-end deep residual neural network (ResNet) model for automatic cardiac abnormalities detection from 12-lead ECG recordings. We evaluate the effectiveness of data augmentation/filtering and automation techniques using the proposed ResNet-based model on the China Physiological Signal Challenge (CPSC) dataset consisting of 9 diagnostic classes. The average F1 scores across 9 classes on the CPSC dataset trained with three data augmentation (baseline wander addition, dropout, and scaling) and a filter (sigmoid compression) were significantly higher than that without using augmentation/filters (baseline). The highest average F1 score with sigmoid compression method was significantly higher (relative improvement of 2.04 %) than the baseline while horizontal and vertical flipping augmentations were detrimental to the classification performance. Additionally, the results show that the random combination of four selected data augmentation and filter using the modified RandAugment technique provided a significantly higher average F1 score (relative improvement of 2.54 %) compared to the baseline. The proposed data augmentation, filters, and automation techniques provide an effective solution to improve the classification performance of the end-to-end deep learning model from ECG recordings without changing the model hyperparameters and structure.

  • Research Article
  • 10.3389/fpsyt.2025.1672178
Leveraging data augmentation for machine learning models in predicting depression and anxiety using the Revised Child Anxiety and Depression Scale clinical reports
  • Nov 27, 2025
  • Frontiers in Psychiatry
  • Saleha Noor + 5 more

Objective An estimated 15 million people are affected by depression and anxiety in Pakistan. However, there are relatively few government mental health facilities and certified psychiatrists. This highlights the need for efficient assessments to implement intervention strategies and address these challenges. This study aims to utilize machine learning with RCADS to maximize the use of current healthcare resources and facilitate depression and anxiety screening. Methods The dataset include 138 cases, with 89 retained after cleaning along 47 RCADS-items as features. Based on RCADS-47 T-scores, cases were classified as normal, borderline and clinical, with 7% in the borderline, 55% in normal and 38% in clinical range. Three feature selection methods - the Chi-square test of independence, Spearman’s correlation, and Random Forest-Recursive Feature Elimination were performed. Data augmentation was done using the probability distribution of the existing data to generate hybrid-synthetic correlated discrete multinomial variants of each item of RCADS. Six commonly employed ML algorithms, Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, Naive Bayes, and K-Nearest Neighbor, were trained on the original dataset and the top three best models were then evaluated on augmented datasets and the best among them, further validated on external dataset. Results Item 05 of the RCADS has a weak correlation with the evaluation of depression and anxiety in the study population. Data augmented to forth time its original size was determined to be the optimal ratio for our dataset as Random Forest yielded the best overall results with up to 81% macro average accuracy, precision, recall and F1 score when tested on this data. Conclusion The findings suggest that the Random Forest algorithm using 46 features suits the data well and has the potential to be further developed as a decision support system for the concerned professionals and improve the usual way of screening anxiety and depression in children and adolescents.

  • Research Article
  • Cite Count Icon 125
  • 10.1088/1741-2552/abb580
Data augmentation for enhancing EEG-based emotion recognition with deep generative models
  • Oct 1, 2020
  • Journal of Neural Engineering
  • Yun Luo + 3 more

Objective. The data scarcity problem in emotion recognition from electroencephalography (EEG) leads to difficulty in building an affective model with high accuracy using machine learning algorithms or deep neural networks. Inspired by emerging deep generative models, we propose three methods for augmenting EEG training data to enhance the performance of emotion recognition models. Approach. Our proposed methods are based on two deep generative models, variational autoencoder (VAE) and generative adversarial network (GAN), and two data augmentation ways, full and partial usage strategies. For the full usage strategy, all of the generated data are augmented to the training dataset without judging the quality of the generated data, while for the partial usage, only high-quality data are selected and appended to the training dataset. These three methods are called conditional Wasserstein GAN (cWGAN), selective VAE (sVAE), and selective WGAN (sWGAN). Main results. To evaluate the effectiveness of these proposed methods, we perform a systematic experimental study on two public EEG datasets for emotion recognition, namely, SEED and DEAP. We first generate realistic-like EEG training data in two forms: power spectral density and differential entropy. Then, we augment the original training datasets with a different number of generated realistic-like EEG data. Finally, we train support vector machines and deep neural networks with shortcut layers to build affective models using the original and augmented training datasets. The experimental results demonstrate that our proposed data augmentation methods based on generative models outperform the existing data augmentation approaches such as conditional VAE, Gaussian noise, and rotational data augmentation. We also observe that the number of generated data should be less than 10 times of the original training dataset to achieve the best performance. Significance. The augmented training datasets produced by our proposed sWGAN method significantly enhance the performance of EEG-based emotion recognition models.

  • Research Article
  • Cite Count Icon 1
  • 10.4015/s1016237223500096
HYBRID AI MODEL FOR THE DETECTION OF RHEUMATOID ARTHRITIS FROM HAND RADIOGRAPHS
  • Apr 28, 2023
  • Biomedical Engineering: Applications, Basis and Communications
  • R K Ahalya + 2 more

The study aims to develop a computerized hybrid model using artificial intelligence (AI) for the detection of rheumatoid arthritis (RA) from hand radiographs. The objectives of the study include (i) segmentation of proximal interphalangeal (PIP), and metacarpophalangeal (MCP) joints using the deep learning (DL) method, and features are extracted using handcrafted feature extraction technique (ii) classification of RA and non-RA participants is performed using machine learning (ML) techniques. In the proposed study, the hand radiographs are resized to [Formula: see text] pixels and pre-processed using the various image processing techniques such as sharpening, median filtering, and adaptive histogram equalization. The segmentation of the finger joints is carried out using the U-Net model, and the segmented binary image is converted to gray scale image using the subtraction method. The features are extracted using the Harris feature extractor, and classification of the proposed work is performed using Random Forest and Adaboost ML classifiers. The study included 50 RA patients and 50 normal subjects for the evaluation of RA. Data augmentation is performed to increase the number of images for U-Net segmentation technique. For the classification of RA and healthy subjects, the Random Forest classifier obtained an accuracy of 91.25% whereas the Adaboost classifier had an accuracy of 90%. Thus, the hybrid model using a Random Forest classifier can be used as an effective system for the diagnosis of RA.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icse.2015.186
SOA4DM: Applying an SOA Paradigm to Coordination in Humanitarian Disaster Response
  • May 1, 2015
  • Kelly Lyons + 1 more

Despite efforts to achieve a sustainable state of control over the management of global crises, disasters are occurring with greater frequency, intensity, and affecting many more people than ever before while the resources to deal with them do not grow apace. As we enter 2015, with continued concerns that mega-crises may become the new normal, we need to develop novel methods to improve the efficiency and effectiveness of our management of disasters. Software engineering as a discipline has long had an impact on society beyond its role in the development of software systems. In fact, software engineers have been described as the developers of prototypes for future knowledge workers; tools such as Github and Stack Overflow have demonstrated applications beyond the domain of software engineering. In this paper, we take the potential influence of software engineering one-step further and propose using the software service engineering paradigm as a new approach to managing disasters. Specifically, we show how the underlying principles of service-oriented architectures (SOA) can be applied to the coordination of disaster response operations. We describe key challenges in coordinating disaster response and discuss how an SOA approach can address those challenges.

  • Conference Article
  • 10.5555/2819009.2819091
SOA4DM: applying an SOA paradigm to coordination in humanitarian disaster response
  • May 16, 2015
  • Kelly Lyons + 1 more

Despite efforts to achieve a sustainable state of control over the management of global crises, disasters are occurring with greater frequency, intensity, and affecting many more people than ever before while the resources to deal with them do not grow apace. As we enter 2015, with continued concerns that mega-crises may become the new normal, we need to develop novel methods to improve the efficiency and effectiveness of our management of disasters. Software engineering as a discipline has long had an impact on society beyond its role in the development of software systems. In fact, software engineers have been described as the developers of prototypes for future knowledge workers; tools such as Github and Stack Overflow have demonstrated applications beyond the domain of software engineering. In this paper, we take the potential influence of software engineering one-step further and propose using the software service engineering paradigm as a new approach to managing disasters. Specifically, we show how the underlying principles of service-oriented architectures (SOA) can be applied to the coordination of disaster response operations. We describe key challenges in coordinating disaster response and discuss how an SOA approach can address those challenges.

  • Conference Article
  • Cite Count Icon 25
  • 10.1145/3180155.3182519
Sentiment polarity detection for software development
  • May 27, 2018
  • Fabio Calefato + 3 more

The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within software repositories and information sources. With a few notable exceptions [1][5], empirical software engineering studies have exploited off-the-shelf sentiment analysis tools. However, such tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports [2][4]. In particular, Jongeling et al. [2] show how the choice of the sentiment analysis tool may impact the conclusion validity of empirical studies because not only these tools do not agree with human annotation of developers' communication channels, but they also disagree among themselves. Our goal is to move beyond the limitations of off-the-shelf sentiment analysis tools when applied in the software engineering domain. Accordingly, we present Senti4SD, a sentiment polarity classifier for software developers' communication channels. Senti4SD exploits a suite of lexicon-based, keyword-based, and semantic features for appropriately dealing with the domain-dependent use of a lexicon. We built a Distributional Semantic Model (DSM) to derive the semantic features exploited by Senti4SD. Specifically, we ran word2vec [3] on a collection of over 20 million documents from Stack Overflow, thus obtaining word vectors that are representative of developers' communication style. The classifier is trained and validated using a gold standard of 4,423 Stack Overflow posts, including questions, answers, and comments, which were manually annotated for sentiment polarity. We release the full lab package2, which includes both the gold standard and the emotion annotation guidelines, to ease the execution of replications as well as new studies on emotion awareness in software engineering. To inform future research on word embedding for text categorization and information retrieval in software engineering, the replication kit also includes the DSM. Results. The contribution of the lexicon-based, keyword-based, and semantic features is assessed by our empirical evaluation leveraging different feature settings. With respect to SentiStrength [6], a mainstream off-the-shelf tool that we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. Furthermore, we provide empirical evidence of better performance also in presence of a minimal set of training documents.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-030-33547-2_9
A Comparative Review on the Agile Tenets in the IT Service Management and the Software Engineering Domains
  • Oct 17, 2019
  • Manuel Mora + 3 more

A rigor-oriented paradigm, in the Information Technology Service Management (ITSM) domain, has permeated in multiple international organizations. In contrast, in the Software Engineering (SwE) domain, the agile paradigm has complemented or replaced the rigor-oriented one in the last two decades. Recently shortened ITSM methods (i.e. FitSM and IT4IT) have emerged. However, due to their novelty, there is a scarce analysis of literature on their main assumed agile tenets. In this study, we identified and compared such tenets (i.e. aims, values, and principles) of the agile SwE and ITSM paradigms as well as the adherence to them from a representative literature on agile SwE and ITSM methods. Our results identified a high and low adherence respectively in the SwE and ITSM methods. Thus, a call for a robust theoretical foundation on agile tenets like those found in the SwE methods is required for the ITSM methods.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.jormas.2024.102152
An artificial intelligence mechanism for detecting cystic lesions on CBCT images using deep learning.
  • Dec 1, 2025
  • Journal of stomatology, oral and maxillofacial surgery
  • Rasool Esmaeilyfard + 2 more

An artificial intelligence mechanism for detecting cystic lesions on CBCT images using deep learning.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.ins.2023.03.038
A supervised data augmentation strategy based on random combinations of key features
  • Mar 11, 2023
  • Information Sciences
  • Yongchang Ding + 3 more

A supervised data augmentation strategy based on random combinations of key features

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1371/journal.pone.0300279
Fuzzy ensemble of fined tuned BERT models for domain-specific sentiment analysis of software engineering dataset.
  • May 28, 2024
  • PloS one
  • Zeeshan Anwar + 4 more

Software engineers post their opinions about various topics on social media that can be collectively mined using Sentiment Analysis. Analyzing this opinion is useful because it can provide insight into developers' feedback about various tools and topics. General-purpose sentiment analysis tools do not work well in the software domain because most of these tools are trained on movies and review datasets. Therefore, efforts are underway to develop domain-specific sentiment analysis tools for the Software Engineering (SE) domain. However, existing domain-specific tools for SE struggle to compute negative and neutral sentiments and can not be used on all SE datasets. This work uses a hybrid technique based on deep learning and a fine-tuned BERT model, i.e., Bert-Base, Bert-Large, Bert-LSTM, Bert-GRU, and Bert-CNN presented that is adapted as a domain-specific sentiment analysis tool for Community Question Answering datasets (named as Fuzzy Ensemble). Five different variants of fine-tuned BERT on the SE dataset are developed, and an ensemble of these fine-tuned models is taken using fuzzy logic. The trained model is evaluated on four publicly available benchmark datasets, i.e., Stack Overflow, JavaLib, Jira, and Code Review, using various evaluation metrics. The fuzzy Ensemble model is also compared with the state-of-the-art sentiment analysis tools for the software engineering domain, i.e., SentiStrength-SE, Senti4SD, SentiCR, and Generative Pre-Training Transformer (GPT). GPT mode is fine-tuned by the authors for domain-specific sentiment analysis. The Fuzzy Ensemble model covers the limitation of existing tools and improve accuracy to predict neutral sentiments even on diverse dataset. The fuzzy Ensemble model performs superior to state-of-the-art tools by achieving a maximum F1-score of 0.883.

  • Research Article
  • Cite Count Icon 16
  • 10.1016/j.ibmed.2021.100037
A novel data augmentation approach for mask detection using deep transfer learning
  • Jan 1, 2021
  • Intelligence-Based Medicine
  • Manas Ranjan Prusty + 2 more

A novel data augmentation approach for mask detection using deep transfer learning

  • Research Article
  • Cite Count Icon 42
  • 10.1109/tgrs.2022.3150353
Automatic Fault Delineation in 3-D Seismic Images With Deep Learning: Data Augmentation or Ensemble Learning?
  • Jan 1, 2022
  • IEEE Transactions on Geoscience and Remote Sensing
  • Shizhen Li + 4 more

Delineating seismic faults is one of the main steps in seismic structure interpretation. Recently, deep learning (DL) models are used to automatic seismic fault interpretation. For the DL-based models, there are two widely used techniques, which can enhance the model performance, that is, data augmentation (DA) and ensemble learning (EL). Qualitatively and quantificationally analyzing the performances of these two techniques is a rarely studied domain. In this study, we make detailed comparisons between the DL models using DA and EL. For the DL model with DA, we first build a holistically nested Unet (HUnet) model by adopting the holistically nested module to the widely used Unet model. Then, we train a HUnet model by using the original and its augmented synthetic datasets (HUnet-D model for short). Besides, we train a Unet model in the same way as a comparison (Unet-D model for short). On the other hand, for the DL model with EL, we first obtain several individual HUnet models separately trained by only using a type of the augmented datasets for each time. Next, we propose a data-driven EL model to integrate these HUnet models. Specially, we propose an adjoint-net module for the EL model to extract the multi-scale features from seismic data, which benefits for checking and fine-tuning the fusing results. Finally, we qualitatively and quantificationally evaluate these DL models (Unet-D, HUnet-D, and EL-HUnet) using the synthetic validation dataset. Moreover, we apply these models to 3-D field data volumes for automatic fault interpretation. Compared with the coherence attribute, Unet-D and HUnet-D models, we find that the EL-HUnet model achieves the comparable model performance for effectively enhancing the precision and continuity of the detected faults.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon