Analyzing Hot Bugs in the Linux Kernel by Clustering Fixing Commit Messages

Nikita Alexandrovich Starovoytov,Nikolay Andreevich Golovnev,Sergey Mikhailovich Staroletov

doi:10.15514/ispras-2023-35(3)-16

Nikita Alexandrovich Starovoytov, Nikolay Andreevich Golovnev + Show 1 more

Open Access

PDF Available

https://doi.org/10.15514/ispras-2023-35(3)-16

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

In system software environments, a vast amount of information circulates, making it crucial to utilize this information in order to enhance the operation of such systems. One such system is the Linux kernel, which not only boasts a completely open-source nature, but also provides a comprehensive history through its git repository. Here, every logical code change is accompanied by a message written by the developer in natural language. Within this expansive repository, our focus lies on error correction messages from fixing commits, as analyzing their text can help identify the most common types of errors. Building upon our previous works, this paper proposes the utilization of data analysis methods for this purpose. To achieve our objective, we explore various techniques for processing repository messages and employing automated methods to pinpoint the prevalent bugs within them. By calculating distances between vectorizations of bug fixing messages and grouping them into clusters, we can effectively categorize and isolate the most frequently occurring errors. Our approach is applied to multiple prominent parts within the Linux kernel, allowing for comprehensive results and insights into what is going on with bugs in different subsystems. As a result, we show a summary of bug fixes in such parts of the Linux kernel as kernel, sched, mm, net, irq, x86 and arm64.

Full Text