As software systems evolve to meet the changing needs of users, understanding the source code becomes a critical step in the process. Clustering techniques, also known as modularization techniques, offer a solution to breaking down complex source code into smaller, more manageable parts. This facilitates improved analysis and understanding of the software’s structure. However, the effectiveness of clustering algorithms in code understanding heavily relies on the chosen criteria. While existing methods typically consider cohesion, coupling, and balance between clusters, we argue that these criteria alone may not fully satisfy one of the primary objectives of clustering, which is to enhance understanding. This is because spaghetti-like structures can be created even when these criteria are satisfied. To address this issue, we introduce two new criteria incorporating program control flow to regulate cluster dependencies. By controlling the uniformity of input and output directions, as well as the distribution of inputs and outputs, clustering algorithms can generate clusters that are more developer-friendly and easier to comprehend. We provide intuitive explanations and real-world projects to demonstrate the effectiveness of our approach, and also incorporate feedback from academics and expert programmers. This paper reveals that integrating these new criteria into existing clustering algorithms enables developers to gain deeper insights into the structure of software systems. This, in turn, leads to better design decisions and improved developer understanding of the source code.
Read full abstract