Enhancing Commit Message Categorization in Open-Source Repositories Using Structured Taxonomy and Large Language Models
Version Control Systems (VCS) manage source code changes by storing modifications in a database. A key feature of VCS is the commit function, which saves the project’s current state and summarizes changes through Commit Message (CM). These messages are vital for collaboration, particularly in open-source artificial intelligence (AI) projects on platforms, where contributors work on rapidly evolving codebases. This paper presents an empirical analysis of CM within open-source AI repositories on GitHub, focusing on their content, the effectiveness of categorization by Large Language Models (LLMs), and the impact of message quality on categorization accuracy. A sample of 384 CMs from 34 repositories was manually categorized to establish a taxonomy. Python was then used for automated keyword extraction, refined with regex patterns. Also, an experiment involved assessing the performance of ChatGPT-4 in categorizing CMs, first without guidance and later using our developed taxonomy. Our findings indicate that the quality of CMs varies greatly, which has a clear impact on how efficiently they can be categorized. This study contributes to the field by providing a structured taxonomy of CMs and exploring how tools like ChatGPT-4 can be used to analyze them. The insights from this research are intended to benefit both academic studies and real-world software development, particularly by helping teams better understand and automate the handling of CM in AI projects.
- Conference Article
179
- 10.1109/scam.2014.14
- Sep 1, 2014
Although version control systems allow developers to describe and explain the rationale behind code changes in commit messages, the state of practice indicates that most of the time such commit messages are either very short or even empty. In fact, in a recent study of 23K+ Java projects it has been found that only 10% of the messages are descriptive and over 66% of those messages contained fewer words as compared to a typical English sentence (i.e., 15-20 words). However, accurate and complete commit messages summarizing software changes are important to support a number of development and maintenance tasks. In this paper we present an approach, coined as Change Scribe, which is designed to generate commit messages automatically from change sets. Change Scribe generates natural language commit messages by taking into account commit stereotype, the type of changes (e.g., files rename, changes done only to property files), as well as the impact set of the underlying changes. We evaluated Change Scribe in a survey involving 23 developers in which the participants analyzed automatically generated commit messages from real changes and compared them with commit messages written by the original developers of six open source systems. The results demonstrate that automatically generated messages by Change Scribe are preferred in about 62% of the cases for large commits, and about 54% for small commits.
- Research Article
14
- 10.1109/tse.2024.3478317
- Dec 1, 2024
- IEEE Transactions on Software Engineering
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">diff</i>s, which facilitate collaboration among developers and play a critical role in Open-Source Software (OSS). Very recently, Large Language Models (LLMs) have been applied in diverse code-related tasks owing to their powerful generality. Yet, in the CMG field, few studies systematically explored their effectiveness. This paper conducts the first comprehensive experiment to investigate how far we have been in applying LLM to generate high-quality commit messages and how to go further beyond in this field. Motivated by a pilot analysis, we first construct a multi-lingual high-quality CMG test set following practitioners’ criteria. Afterward, we re-evaluate diverse CMG approaches and make comparisons with recent LLMs. To delve deeper into LLMs’ ability, we further propose four manual metrics following the practice of OSS, including Accuracy, Integrity, Readability, and Applicability for assessment. Results reveal that LLMs have outperformed existing CMG approaches overall, and different LLMs carry different advantages, where GPT-3.5 performs best. To further boost LLMs’ performance in the CMG task, we propose an Efficient Retrieval-based In-Context Learning (ICL) framework, namely ERICommiter, which leverages a two-step filtering to accelerate the retrieval efficiency and introduces semantic/lexical-based retrieval algorithm to construct the ICL examples, thereby guiding the generation of high-quality commit messages with LLMs. Extensive experiments demonstrate the substantial performance improvement of ERICommiter on various LLMs across different programming languages. Meanwhile, ERICommiter also significantly reduces the retrieval time while keeping almost the same performance. Our research contributes to the understanding of LLMs’ capabilities in the CMG field and provides valuable insights for practitioners seeking to leverage these tools in their workflows.
- Book Chapter
1
- 10.1007/978-3-031-80275-1_7
- Jan 1, 2025
- Information systems engineering and management
Leveraging studies on artificial intelligence (AI) stakeholders and success factors, this article sets out to embed an AI perspective in a project management standard and center it around avoiding moral issues—harms, losses, and damages—in AI projects. The study provides an AI Project Framework that identifies the significant differences between AI projects and other information technology (IT) projects, including the AI development lifecycle, risks, issues, and challenges. The study creates a conceptual structure that combines aspects from the International Organization for Standardization (ISO) 21502:2020-12 Project Management standard and the AI project lifecycle. Finally, it weaves a practical framework of interdependencies and success factors for managing AI projects. The study uses an integrative literature review methodology that follows a hermeneutic framework. The study results should offer practical benefits to sponsoring organizations, project sponsors, and project managers in planning and governing AI projects.
- Research Article
33
- 10.1109/tse.2024.3364675
- Apr 1, 2024
- IEEE Transactions on Software Engineering
Commit messages are critical for code comprehension and software maintenance. Writing a high-quality message requires skill and effort. To support developers and reduce their effort on this task, several approaches have been proposed to automatically generate commit messages. Despite the promising performance reported, we have identified three significant and prevalent threats in these automated approaches: 1) the datasets used to train and evaluate these approaches contain a considerable amount of ‘noise’; 2) current approaches only consider commits of a limited diff size; and 3) current approaches can only generate the subject of a commit message, not the message body. The first limitation may let the models ‘learn’ inappropriate messages in the training stage, and also lead to inflated performance results in their evaluation. The other two threats can considerably weaken the practical usability of these approaches. Further, with the rapid emergence of large language models (LLMs) that show superior performance in many software engineering tasks, it is worth asking: can LLMs address the challenge of long diffs and whole message generation? This article first reports the results of an empirical study to assess the impact of these three threats on the performance of the state-of-the-art auto generators of commit messages. We collected commit data of the Top 1,000 most-starred Java projects in GitHub and systematically removed noisy commits with bot-submitted and meaningless messages. We then compared the performance of four approaches representative of the state-of-the-art before and after the removal of noisy messages, or with different lengths of commit diffs. We also conducted a qualitative survey with developers to investigate their perspectives on simply generating message subjects. Finally, we evaluate the performance of two representative LLMs, namely UniXcoder and ChatGPT, in generating more practical commit messages. The results demonstrate that generating commit messages is of great practical value, considerable work is needed to mature the current state-of-the-art, and LLMs can be an avenue worth trying to address the current limitations. Our analyses provide insights for future work to achieve better performance in practice.
- Research Article
17
- 10.1145/3643760
- Jul 12, 2024
- Proceedings of the ACM on Software Engineering
Commit messages play a vital role in software development and maintenance. While previous research has introduced various Commit Message Generation (CMG) approaches, they often suffer from a lack of consideration for the broader software context associated with code changes. This limitation resulted in generated commit messages that contained insufficient information and were poorly readable. To address these shortcomings, we approached CMG as a knowledge-intensive reasoning task. We employed ReAct prompting with a cutting-edge Large Language Model (LLM) to generate high-quality commit messages. Our tool retrieves a wide range of software context information, enabling the LLM to create commit messages that are factually grounded and comprehensive. Additionally, we gathered commit message quality expectations from software practitioners, incorporating them into our approach to further enhance message quality. Human evaluation demonstrates the overall effectiveness of our CMG approach, which we named Omniscient Message Generator (OMG). It achieved an average improvement of 30.2% over human-written messages and a 71.6% improvement over state-of-the-art CMG methods.
- Research Article
11
- 10.1287/ijds.2023.0007
- Apr 1, 2023
- INFORMS Journal on Data Science
How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
- Conference Article
97
- 10.24963/ijcai.2019/552
- Aug 1, 2019
Commit messages, which summarize the source code changes in natural language, are essential for program comprehension and software evolution understanding. Unfortunately, due to the lack of direct motivation, commit messages are sometimes neglected by developers, making it necessary to automatically generate such messages. State-of-the-art adopts learning based approaches such as neural machine translation models for the commit message generation problem. However, they tend to ignore the code structure information and suffer from the out-of-vocabulary issue. In this paper, we propose CoDiSum to address the above two limitations. In particular, we first extract both code structure and code semantics from the source code changes, and then jointly model these two sources of information so as to better learn the representations of the code changes. Moreover, we augment the model with copying mechanism to further mitigate the out-of-vocabulary issue. Experimental evaluations on real data demonstrate that the proposed approach significantly outperforms the state-of-the-art in terms of accurately generating the commit messages.
- Research Article
- 10.1142/s0218194022500814
- Feb 15, 2023
- International Journal of Software Engineering and Knowledge Engineering
Recently, Python is the most-widely used language in artificial intelligence (AI) projects requiring huge amount of CPU and memory resources, and long execution time for training. For saving the project duration and making AI software systems more reliable, it is inevitable to handle exceptions appropriately at the code level. However, handling exceptions highly relies on developer’s experience. This is because, as an interpreter-based programming language, it does not force a developer to catch exceptions during development. In order to resolve this issue, we propose an approach to suggesting appropriate exceptions for the AI code segments during development after training exceptions from the existing handling statements in the AI projects. This approach learns the appropriate token units for the exception code and pretrains the embedding model to capture the semantic features of the code. Additionally, the attention mechanism learns to catch the salient features of the exception code. For evaluating our approach, we collected 32,771 AI projects using two popular AI frameworks (i.e. Pytorch and Tensorflow) and we obtained the 0.94 of Area under the Precision-Recall Curve (AUPRC) on average. Experimental results show that the proposed method can support the developer’s exception handling with better exception proposal performance than the compared models.
- Research Article
8
- 10.22367/jem.2022.44.18
- Jan 1, 2022
- Journal of Economics and Management
Aim/purpose – This research presents a conceptual stakeholder accountability model for mapping the project actors to the conduct for which they should be held accountable in artificial intelligence (AI) projects. AI projects differ from other projects in important ways, including in their capacity to inflict harm and impact human and civil rights on a global scale. The in-project decisions are high stakes, and it is critical who decides the system’s features. Even well-designed AI systems can be deployed in ways that harm individuals, local communities, and society. Design/methodology/approach – The present study uses a systematic literature review, accountability theory, and AI success factors to elaborate on the relationships between AI project actors and stakeholders. The literature review follows the preferred reporting items for systematic reviews and meta-analyses (PRISMA) statement process. Bovens’ accountability model and AI success factors are employed as a basis for the coding framework in the thematic analysis. The study uses a web-based survey to collect data from respondents in the United States and Germany employing statistical analysis to assess public opinion on AI fairness, sustainability, and accountability. Findings – The AI stakeholder accountability model specifies the complex relationships between 16 actors and 22 stakeholder forums using 78 AI success factors to define the conduct and the obligations and consequences that characterize those relationships. The survey analysis suggests that more than 80% of the public thinks AI development should be fair and sustainable, and it sees the government and development organizations as most accountable in this regard. There are some differences between the United States and Germany regarding fairness, sustainability, and accountability. Research implications/limitations – The results should benefit project managers and project sponsors in stakeholder identification and resource assignment. The definitions offer policy advisors insights for updating AI governance practices. The model presented here is conceptual and has not been validated using real-world projects. Originality/value/contribution – The study adds context-specific information on AI to the project management literature. It defines project actors as moral agents and provides a model for mapping the accountability of project actors to stakeholder expectations and system impacts. Keywords: accountability, artificial intelligence, algorithms, project management, ethics. JEL Classification: C33, M15, O3, O32, O33, Q55.
- Research Article
3
- 10.18535/ijsrm/v11i07.em03
- Jul 30, 2023
- International Journal of Scientific Research and Management (IJSRM)
The rapid advancement of artificial intelligence (AI) has brought transformative changes to project management, necessitating a departure from traditional methodologies previously employed in IT project implementations. This paper explores the evolution of project management from conventional IT approaches to the dynamic demands of AI-driven projects. While foundational principles of project management—such as planning, risk management, and stakeholder communication—remain relevant, AI projects introduce unique challenges and require significant adaptations to existing frameworks. The study begins by delineating the characteristics and core principles of traditional IT project management. Traditional methods are characterized by their structured phases, fixed requirements, and a focus on sequential task execution. These principles have been foundational in achieving success in conventional IT projects through detailed planning, rigorous documentation, and predefined quality assurance measures. In contrast, AI projects are distinguished by their reliance on data, iterative development, and high levels of uncertainty. Unique characteristics of AI projects include the need for continuous experimentation, data-driven decision-making, and adaptability to evolving project requirements. The paper identifies key challenges in managing AI projects, such as dealing with data quality issues, ensuring model interpretability, and addressing ethical considerations. To effectively manage AI projects, project managers must adopt new strategies, including Agile and iterative methodologies that support flexibility and continuous feedback. The study emphasizes the importance of cross-functional teams, as AI projects require diverse expertise from data scientists, engineers, and domain specialists. Additionally, handling the inherent uncertainty in AI projects involves fostering a culture of innovation and adaptability. Key differences between traditional IT and AI project management are analyzed, highlighting variations in planning and scoping, risk management, stakeholder communication, and quality assurance. Traditional IT management relies on detailed upfront planning and predictable risk management, whereas AI projects necessitate adaptive planning, dynamic risk assessment, and ongoing model validation. The paper also addresses the transition to AI project management, discussing necessary skill adaptations for project managers, organizational changes to support AI initiatives, and the role of specialized tools and technologies. A hypothetical case study illustrates how traditional IT project management experience can be applied to an AI project, providing insights into practical adaptations and lessons learned. LllLooking forward, the paper explores emerging trends in project management influenced by AI advancements and emphasizes the need for continuous learning and adaptation. The evolving role of project managers in the AI era is examined, underscoring the importance of embracing new methodologies and technologies to stay relevant. While core project management principles remain integral, the shift to AI-driven projects requires substantial modifications to traditional practices. Project managers must evolve their approaches to navigate the complexities of AI projects effectively, ensuring continued success in an increasingly technology-driven landscape.
- Research Article
- 10.3390/computers14100427
- Oct 7, 2025
- Computers
Commit messages are vital for traceability, maintenance, and onboarding in modern software projects, yet their quality is frequently inconsistent. Recent large language models (LLMs) can transform code diffs into natural language summaries, offering a path to more consistent and informative commit messages. This paper makes two contributions: (i) it provides a systematic survey of automated commit message generation with LLMs, critically comparing prompt-only, fine-tuned, and retrieval-augmented approaches; and (ii) it specifies a transparent, agent-based evaluation blueprint centered on CommitBench. Unlike prior reviews, we include a detailed dataset audit, preprocessing impacts, evaluation metrics, and error taxonomy. The protocol defines dataset usage and splits, prompting and context settings, scoring and selection rules, and reporting guidelines (results by project, language, and commit type), along with an error taxonomy to guide qualitative analysis. Importantly, this work emphasizes methodology and design rather than presenting new empirical benchmarking results. The blueprint is intended to support reproducibility and comparability in future studies.
- Research Article
1
- 10.3138/jelis-2024-0033
- Feb 24, 2025
- Journal of Education for Library and Information Science
Artificial Intelligence (AI) is reshaping all sectors of society, including libraries. AI adoption in libraries has been gradual due to concerns and challenges, including ethical issues, maturity of the technology, insufficient AI education and training designed for library and information professionals, and gaps in AI education in library and information science (LIS) programs. This case study reports on the motivations, processes, and evaluations of the IDEA Institute on AI that was developed to equip two cohorts (Fellows) of information professionals who participated in the 2021 and 2022 IDEA Institute on AI with the foundational knowledge and skills to lead AI work. A multi-method approach was used to collect and analyze the evaluation data from multiple sources at different points of the IDEA Institute on AI. The IDEA Institute on AI applied an outcome-based planning and evaluation model and employed formative and summative evaluations using surveys and focus-group discussions. Fellows worked in various library and information environments, most in academic libraries. The case study results showed that the Fellows’ AI knowledge and skills increased substantially, their confidence greatly increased upon completing the IDEA Institute on AI, and they engaged in AI projects in their workplaces. They built awareness of AI issues and challenges and developed a comprehensive understanding of AI within the context of equity, diversity, inclusion, and accessibility. The Fellows’ supervisors were positive about the learning and experience their Fellows gained from the IDEA Institute on AI and their peers. The results of this case study have significant implications for developing AI professional development programs in the LIS field, providing exemplary AI education and training as AI technology evolves, including generative AI and large language models, and integrating AI into LIS curricula.
- Research Article
74
- 10.1016/j.plas.2022.100068
- Nov 4, 2022
- Project Leadership and Society
Stakeholder roles in artificial intelligence projects
- Research Article
7
- 10.1016/j.infsof.2023.107393
- Dec 19, 2023
- Information and Software Technology
Multi-grained contextual code representation learning for commit message generation
- Research Article
3
- 10.1145/3695996
- Jan 18, 2025
- ACM Transactions on Software Engineering and Methodology
Commit messages are important for developers to understand the content and the reason for code changes. However, poor and even empty commit messages widely exist. To improve the quality of commit messages and development efficiency, many commit message generation methods have been proposed. Nevertheless, previous methods mainly focus on a brief generation problem, where both the input code change and the output commit messages are restricted to short. This may initiate a debate on the performance of these methods in practice. In this article, we attempt to remove the restrictions and move the needle forward to a holistic commit message generation problem. In particular, we conduct experiments to evaluate the performance of existing commit message generation methods in holistic commit message generation. In the experiments, we choose seven state-of-the-art commit generation methods and focus on two important scenarios in commit message generation (i.e., the within-project scenario and the cross-project scenario). To conduct our experiments, we publish a holistic commit message dataset HORDA with test data manually labeled. In our evaluations, we find that in generating holistic commit messages, the IR-based method has a better performance than non-pre-trained generation-based methods in the within-project scenario, contradicting previous research findings. Further, while the pre-trained generation-based methods are better than non-pre-trained generation-based methods, they are still constrained by the limitations of generation models.