Word Embedding Technique Research Articles

Bug assignment, or bug triage, focuses on identifying the appropriate developers to repair newly discovered bugs, thereby managing them more effectively. Several deep learning-based approaches have been proposed for automated bug assignment. These approaches view automated bug assignment as a text classification task — the textual description of a bug report is utilized as the input and the potential fixers are regarded as the output labels. Such approaches typically depend on the classification performance of natural language processing and machine learning techniques. Various word embedding and deep learning models have emerged continuously. The effectiveness of those approaches depends on the chosen deep learning model, used for classification, and the word embedding model, used for representing bug reports. However, prior research does not empirically evaluate the impacts of various word embedding and deep learning models for automated bug assignment. In this paper, we conduct an empirical study to analyze the performance variations among 35 deep learning-based automated bug assignment approaches. These approaches are based on five word embedding techniques, i.e., Word2Vec, GloVe, NextBug, ELMo, and BERT, and seven text classification models, i.e., TextCNN, LSTM, Bi-LSTM, LSTM with attention, Bi-LSTM with attention, MLP, and Naive Bayes. We evaluated these combinations across three benchmark datasets, namely Eclipse JDT, GCC, and Firefox, and their mergence i.e., a cross-project dataset. Our main observations are: (1) Bi-LSTM with attention and Bi-LSTM using ELMo are significantly superior to other deep learning models on bug assignment tasks in terms of top-k (k = 1, 5, 10) accuracy and MRR; (2) Both the summary and description of bug reports are useful for bug assignment, but the description is more useful than the summary; (3) The training corpus for word embedding models has a significant impact on the performance of deep learning-based bug assignment methods. Our results show the importance of tuning different components (e.g. word embedding model, classification model, and textual input) in deep learning-based automated bug assignment methods and provide important insights for practitioners and researchers.

Read full abstract

Today, businesses rely on numerous information systems to achieve their production goals and improve their global competitiveness. Semantically integrating those systems is essential for businesses to achieve both. To do so, businesses must rely on standards, the most important of which are data exchange standards (DES). DES focus on technical and business semantics that are needed to deliver quality and timely products and services. Consequently, the ability for businesses to quickly use and adapt DES to their innovations and processes is crucial.Traditionally, information standards are managed and used 1) in a platform-specific form and 2) usually with standalone and file-based applications. These traditional approaches no longer meet today's business and information agility needs. For example, businesses now must deal with companies and suppliers that use heterogeneous syntaxes for their information. Syntaxes that are optimized for individual but have different objectives. Moreover, file-based standards and the usage specifications derived from the standards cause inconsistencies since there is neither a single standard format for each usage specification nor a single source of truth for all of them.As the number and types of information systems grow, developing, maintaining, reviewing, and approving standards and their derived usage specifications are becoming more difficult and time consuming. Each file-based usage specification is typically based on a different syntax than the standard syntax. As a result, each usage specification must be manually updated as the standard evolves; this can cause significant delays and costs in adopting the new and better standard versions. National Institute of Standards and Technology (NIST) in collaboration with the Open Application Groups Inc. (OAGi) has developed a web-based standard lifecycle management tool called SCORE to address these problems. The objective of this paper is to introduce the SCORE tool and discuss its particular functionality where a word-embedding technique has been employed along with other schema-matching approaches. Together they can assist standard users in updating the usage specification due to the release of new version of a standard leading to faster adaptations of DES to new processes.

Read full abstract

Word Embedding Technique Research Articles

Related Topics

Articles published on Word Embedding Technique

Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features

Optimizing aspect-based sentiment analysis using sentence embedding transformer, bayesian search clustering, and sparse attention mechanism

Diverse ensemble classifier driven Email spam classification using multiple word embedding’s with COCOB optimizer

An empirical assessment of different word embedding and deep learning models for bug assignment

Word Embedding as a Semantic Feature Extraction Technique in Arabic Natural Language Processing: An Overview

Bidirectional Gated Recurrent Unit with Glove Embedding and Attention Mechanism for Movie Review Classification

Word Embedding as a Semantic Feature Extraction Technique in Arabic Natural Language Processing: An Overview

Test

Suicidal Thought Detection using Max Voting Ensemble Technique

A Comparative Analysis of Word Embeddings Techniques for Italian News Categorization

Aspect Based Sentiment Analysis: Feature Extraction using Latent Dirichlet Allocation (LDA) and Term Frequency - Inverse Document Frequency (TF-IDF) in Machine Learning (ML)

Boosting Prediction of Protein-Protein Interactions using Word Embedding Techniques

Employing word-embedding for schema matching in standard lifecycle management

Personalized Query Expansion with Contextual Word Embeddings

Sentiment Analysis of Roman Urdu Text Using Machine Learning Techniques

Authorship Verification in software engineering education

A Natural-Language-Processing-Based Method for the Clustering and Analysis of Movie Reviews and Classification by Genre

A content-based metric for social media influencer marketing

Representation-centric approach for classification of Consumer Health Questions

Topic Modeling based Text Classification Regarding Islamophobia using Word Embedding and Transformers Techniques

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Word Embedding Technique Research Articles

Related Topics

Articles published on Word Embedding Technique

Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features

Optimizing aspect-based sentiment analysis using sentence embedding transformer, bayesian search clustering, and sparse attention mechanism

Diverse ensemble classifier driven Email spam classification using multiple word embedding’s with COCOB optimizer

An empirical assessment of different word embedding and deep learning models for bug assignment

Word Embedding as a Semantic Feature Extraction Technique in Arabic Natural Language Processing: An Overview

Bidirectional Gated Recurrent Unit with Glove Embedding and Attention Mechanism for Movie Review Classification

Word Embedding as a Semantic Feature Extraction Technique in Arabic Natural Language Processing: An Overview

Test

Suicidal Thought Detection using Max Voting Ensemble Technique

A Comparative Analysis of Word Embeddings Techniques for Italian News Categorization

Aspect Based Sentiment Analysis: Feature Extraction using Latent Dirichlet Allocation (LDA) and Term Frequency - Inverse Document Frequency (TF-IDF) in Machine Learning (ML)

Boosting Prediction of Protein-Protein Interactions using Word Embedding Techniques

Employing word-embedding for schema matching in standard lifecycle management

Personalized Query Expansion with Contextual Word Embeddings

Sentiment Analysis of Roman Urdu Text Using Machine Learning Techniques

Authorship Verification in software engineering education

A Natural-Language-Processing-Based Method for the Clustering and Analysis of Movie Reviews and Classification by Genre

A content-based metric for social media influencer marketing

Representation-centric approach for classification of Consumer Health Questions

Topic Modeling based Text Classification Regarding Islamophobia using Word Embedding and Transformers Techniques