An Empirical Study on Learning-based Techniques for Explicit and Implicit Commit Messages Generation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

High-quality and appropriate commit messages help developers to quickly understand and track code evolution, which is crucial for the collaborative development and maintenance of software. To relieve developers of the burden of writing commit messages, researchers have proposed various techniques to generate commit messages automatically, among which learning-based techniques have proven to be promising.

Similar Papers
  • Research Article
  • Cite Count Icon 24
  • 10.2516/ogst/2011118
Single Event Kinetic Modelling without Explicit Generation of Large Networks: Application to Hydrocracking of Long Paraffins
  • May 1, 2011
  • Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles
  • D Guillaume + 5 more

The single event modelling concept allows developing kinetic models for the simulation of refinery processes. For reaction networks with several hundreds of thousands of species, as is the case for catalytic reforming, rigorous relumping by carbon atom number and branching degree were efficiently employed by assuming chemical equilibrium in each lump. This relumping technique yields a compact lumped model without any loss of information, but requires the full detail of an explicitly generated reaction network.Classic network generation techniques become impractical when the hydrocarbon species contain more than approximately 20 carbon atoms, because of the extremely rapid growth of reaction network. Hence, implicit relumping techniques were developed in order to compute lumping coefficients without generating the detailed reaction network. Two alternative and equivalent approaches are presented, based either on structural classes or on lateral chain decomposition. These two methods are discussed and the lateral chain decomposition method is applied to the kinetic modelling of long chain paraffin hydroisomerization and hydrocracking. The lateral chain decomposition technique is exactly equivalent to the original calculation method based on the explicitly generated detailed reaction network, as long as Benson’s group contribution method is used to calculate the necessary thermodynamic data in both approaches.

  • Research Article
  • Cite Count Icon 22
  • 10.1609/aaai.v35i16.17675
Open Domain Dialogue Generation with Latent Images
  • May 18, 2021
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Ze Yang + 5 more

We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.

  • Research Article
  • Cite Count Icon 63
  • 10.1016/j.jvlc.2013.08.006
A survey of Euler diagrams
  • Sep 7, 2013
  • Journal of Visual Languages & Computing
  • Peter Rodgers

A survey of Euler diagrams

  • Research Article
  • Cite Count Icon 33
  • 10.1016/j.sbspro.2016.05.434
Travel Intentions among Foreign Tourists for Medical Treatment in Malaysia: An Empirical Study
  • Jun 1, 2016
  • Procedia - Social and Behavioral Sciences
  • Seow Ai Na + 2 more

Travel Intentions among Foreign Tourists for Medical Treatment in Malaysia: An Empirical Study

  • Conference Article
  • Cite Count Icon 42
  • 10.1109/icsme52107.2021.00018
On the Evaluation of Commit Message Generation Models: An Experimental Study
  • Sep 1, 2021
  • Wei Tao + 7 more

Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets. We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods. (2) Most existing datasets are crawled only from Java repositories while repositories in other programming languages are not sufficiently explored. (3) Dataset splitting strategies can influence the performance of existing models by a large margin. Some models show better performance when the datasets are split by commit, while other models perform better when the datasets are split by timestamp or by project. Based on our findings, we conduct a human evaluation and find the BLEU metric that best correlates with the human scores for the task. We also collect a large-scale, information-rich, and multi-language commit message dataset MCMD and evaluate existing models on this dataset. Furthermore, we conduct extensive experiments under different dataset splitting strategies and suggest the suitable models under different scenarios. Based on the experimental results and findings, we provide feasible suggestions for comprehensively evaluating commit message generation models and discuss possible future research directions. We believe this work can help practitioners and researchers better evaluate and select models for automatic commit message generation. Our source code and data are available at https://github.com/DeepSoftwareAnalytics/CommitMsgEmpirical.

  • Conference Article
  • Cite Count Icon 5
  • 10.1117/12.2305875
Active learning and structure adaptation in teams of heterogeneous agents: designing organizations of the future
  • May 21, 2018
  • Georgiy M Levchuk + 4 more

Many novel DoD missions, from disaster relief to cyber reconnaissance, require teams of humans and machines with diverse capabilities and intelligence. To succeed, DoD planners organize available personnel and technologies into mission-based teams and organizations. Enabled by next generation of sensors, new ways to access information, increasing capabilities of robotic platforms, and advances in machine learning and artificial intelligence for distributed inference and control applications, the new types of teams are emerging that include autonomous collaborating human and machine agents. Developing models to extract highest potential from human-machine teaming is the defense technology of the future. While many empirical studies have demonstrated the benefits of alternative organizations, such as adaptive networks command and control structures, traditional computational team design solutions have mostly focused on teams of homogeneous agents (such as swarms or social networks), and simple problems (such as cooperative task allocation, geospatial movement, and collaborative decision making). Because machines and humans often have distinct and complementary skills, team members could perform different roles and have changing relations over time. To improve team performance, new solutions are needed to dynamically adapt team structure to better fit the tasks that a team executes. In this paper, we present a continuation of our work on adaptive self-organizing teams. Our model is based on team active inference, the model that describes the approximate inference as an iterative minimization of the free variational energy encoding the task performance and team process complexity. Our model provides the methodology for adapting the structure of heterogeneous organization in distributed manner, where the agents on the team make local decisions to change their roles and relations which are synchronized through explicit collaborative messages. The roles of agents are defined through decomposition of the generalized task types into groups, and assignment of these groups to agents. We obtain decomposition groups using variational clustering on the factor graph, which defines the contribution of the tasks and their dependencies on the team’s objective function. This clustering constructs regions in the factor graph that trade-off independence, work balancing, and the overlap to help optimized organization obtain globally-optimal solutions in distributed manner under communication uncertainties.

  • Research Article
  • Cite Count Icon 13
  • 10.1145/3643675
KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation
  • Jun 4, 2024
  • ACM Transactions on Software Engineering and Methodology
  • Wei Tao + 5 more

Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods.

  • Conference Article
  • Cite Count Icon 23
  • 10.4230/lipics.ecoop.2018.6
Learning to Accelerate Symbolic Execution via Code Transformation
  • Jul 1, 2018
  • DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)
  • Junjie Chen + 5 more

Symbolic execution is an effective but expensive technique for automated test generation. Over the years, a large number of refined symbolic execution techniques have been proposed to improve its efficiency. However, the symbolic execution efficiency problem remains, and largely limits the application of symbolic execution in practice. Orthogonal to refined symbolic execution, in this paper we propose to accelerate symbolic execution through semantic-preserving code transformation on the target programs. During the initial stage of this direction, we adopt a particular code transformation, compiler optimization, which is initially proposed to accelerate program concrete execution by transforming the source program into another semantic-preserving target program with increased efficiency (e.g., faster or smaller). However, compiler optimizations are mostly designed to accelerate program concrete execution rather than symbolic execution. Recent work also reported that unified settings on compiler optimizations that can accelerate symbolic execution for any program do not exist at all. Therefore, in this work we propose a machine-learning based approach to tuning compiler optimizations to accelerate symbolic execution, whose results may also aid further design of specific code transformations for symbolic execution. In particular, the proposed approach LEO separates source-code functions and libraries through our program-splitter, and predicts individual compiler optimization (i.e., whether a type of code transformation is chosen) separately through analyzing the performance of existing symbolic execution. Finally, LEO applies symbolic execution on the code transformed by compiler optimization (through our local-optimizer). We conduct an empirical study on GNU Coreutils programs using the KLEE symbolic execution engine. The results show that LEO significantly accelerates symbolic execution, outperforming the default KLEE configurations (i.e., turning on/off all compiler optimizations) in various settings, e.g., with the default training/testing time, LEO achieves the highest line coverage in 50/68 programs, and its average improvement rate on all programs is 46.48%/88.92% in terms of line coverage compared with turning on/off all compiler optimizations.

  • Research Article
  • Cite Count Icon 21
  • 10.1016/j.aei.2019.02.003
On the role of generating textual description for design intent communication in feature-based 3D collaborative design
  • Jan 1, 2019
  • Advanced Engineering Informatics
  • Yuan Cheng + 3 more

On the role of generating textual description for design intent communication in feature-based 3D collaborative design

  • Conference Article
  • Cite Count Icon 10
  • 10.1145/2381416.2381422
Generating route instructions with varying levels of detail
  • Nov 30, 2011
  • Jürgen Ziegler + 4 more

In this paper, we present a technique for adaptive generation of personalized route instructions based on the driver's knowledge of particular route sections. We evaluated the mechanism with two empirical studies, both attesting significant preference for the adaptively generated presentations over an established online service (Google Maps).

  • Research Article
  • Cite Count Icon 27
  • 10.1186/1472-6920-14-184
Family physicians’ professional identity formation: a study protocol to explore impression management processes in institutional academic contexts
  • Sep 6, 2014
  • BMC Medical Education
  • Charo Rodríguez + 11 more

BackgroundDespite significant differences in terms of medical training and health care context, the phenomenon of medical students’ declining interest in family medicine has been well documented in North America and in many other developed countries as well. As part of a research program on family physicians’ professional identity formation initiated in 2007, the purpose of the present investigation is to examine in-depth how family physicians construct their professional image in academic contexts; in other words, this study will allow us to identify and understand the processes whereby family physicians with an academic appointment seek to control the ideas others form about them as a professional group, i.e. impression management.Methods/DesignThe methodology consists of a multiple case study embedded in the perspective of institutional theory. Four international cases from Canada, France, Ireland and Spain will be conducted; the "case" is the medical school. Four levels of analysis will be considered: individual family physicians, interpersonal relationships, family physician professional group, and organization (medical school). Individual interviews and focus groups with academic family physicians will constitute the main technique for data generation, which will be complemented with a variety of documentary sources. Discourse techniques, more particularly rhetorical analysis, will be used to analyze the data gathered. Within- and cross-case analysis will then be performed.DiscussionThis empirical study is strongly grounded in theory and will contribute to the scant body of literature on family physicians’ professional identity formation processes in medical schools. Findings will potentially have important implications for the practice of family medicine, medical education and health and educational policies.Electronic supplementary materialThe online version of this article (doi:10.1186/1472-6920-14-184) contains supplementary material, which is available to authorized users.

  • Research Article
  • Cite Count Icon 10
  • 10.1007/s11432-015-0450-5
An empirical study on constraint optimization techniques for test generation
  • Oct 13, 2016
  • Science China Information Sciences
  • Zhiyi Zhang + 4 more

Constraint solving is a frequent, but expensive operation with symbolic execution to generate tests for a program. To improve the efficiency of test generation using constraint solving, four optimization techniques are usually applied to existing constraint solvers, which are constraint independence, constraint set simplification, constraint caching, and expression rewriting. In this paper, we conducted an empirical study, using these four constraint optimization techniques in a well known test generation tool KLEE with 77 GNU Coreutils applications, to systematically investigate how these optimization techniques affect the efficiency of test generation. The experimental results show that these constraint optimization techniques as well as their combinations cannot improve the efficiency of test generation significantly for ALL-SIZED programs. Moreover, we studied the constraint optimization techniques with respect to two static metrics, lines of code (LOC) and cyclomatic complexity (CC), of programs. The experimental results show that the ``constraint set simplification technique can improve the efficiency of test generation significantly for the programs with high LOC and CC values. The ``constraint caching optimization technique can improve the efficiency of test generation significantly for the programs with low LOC and CC values. Finally, we propose four hybrid optimization strategies and practical guidelines based on different static metrics.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.jss.2022.111269
Random or heuristic? An empirical study on path search strategies for test generation in KLEE
  • Feb 17, 2022
  • Journal of Systems and Software
  • Zhiyi Zhang + 5 more

Random or heuristic? An empirical study on path search strategies for test generation in KLEE

  • Research Article
  • Cite Count Icon 31
  • 10.1007/s11390-020-0496-0
Learning Human-Written Commit Messages to Document Code Changes
  • Nov 1, 2020
  • Journal of Computer Science and Technology
  • Yuan Huang + 5 more

Commit messages are important complementary information used in understanding code changes. To address message scarcity, some work is proposed for automatically generating commit messages. However, most of these approaches focus on generating summary of the changed software entities at the superficial level, without considering the intent behind the code changes (e.g., the existing approaches cannot generate such message: “fixing null pointer exception”). Considering developers often describe the intent behind the code change when writing the messages, we propose ChangeDoc, an approach to reuse existing messages in version control systems for automatical commit message generation. Our approach includes syntax, semantic, pre-syntax, and pre-semantic similarities. For a given commit without messages, it is able to discover its most similar past commit from a large commit repository, and recommend its message as the message of the given commit. Our repository contains half a million commits that were collected from SourceForge. We evaluate our approach on the commits from 10 projects. The results show that 21.5% of the recommended messages by ChangeDoc can be directly used without modification, and 62.8% require minor modifications. In order to evaluate the quality of the commit messages recommended by ChangeDoc, we performed two empirical studies involving a total of 40 participants (10 professional developers and 30 students). The results indicate that the recommended messages are very good approximations of the ones written by developers and often include important intent information that is not included in the messages generated by other tools.

  • Research Article
  • 10.55041/isjem01377
Career Recognition & Academic Counseling (CRAC Bot)
  • Mar 23, 2024
  • International Scientific Journal of Engineering and Management
  • Mr Satish Khode

In recent years, the advancement of Artificial Intelligence (AI) technologies has revolutionized various sectors, including education and career guidance. This research paper presents an innovative approach to integrating Career Recognition and Academic Counselling Chatbots with College and Learning Management Systems (LLMs) using Generative AI and Large Language Models (LLMs). The proposed system aims to enhance the academic counselling experience for students by providing personalized career guidance, course recommendations, and academic support through an intelligent chatbot interface. Leveraging Generative AI techniques, the chatbot can generate natural language responses and engage in meaningful conversations with users, facilitating efficient communication and knowledge dissemination. Additionally, the integration with College and LLMS enables seamless access to academic resources, course materials, and learning opportunities, empowering students to make informed decisions about their academic and career pathways. The research paper discusses the design, development, and implementation of the integrated system, along with a comprehensive evaluation of its effectiveness in enhancing student engagement, academic success, and career readiness. Through empirical studies and user feedback analysis, the research demonstrates the potential impact of AI-driven academic counselling solutions in improving student outcomes and fostering a supportive learning environment. Finally, the paper discusses the implications, challenges, and future directions of integrating Generative AI and LLMs into academic counselling and career recognition systems, highlighting opportunities for further research and innovation in this field. Keywords: Generative AI, LLM, Chatbot, AI-learning Management systems, academic counselling

Save Icon
Up Arrow
Open/Close