Software Clustering: Unifying Syntactic and Semantic Features

Janardan Misra,Shubhashis Sengupta,K.M Annervaz,Gary Titus,Vikrant Kaulgud

doi:10.1109/wcre.2012.21

Abstract

Software clustering is an important technique for extracting high level component architecture from the underlying source code. One of the limitations of the existing approaches is that most of the proposed techniques use only similar types of features for estimating distance between source code elements. Therefore, in cases, where the selected features are poorly present in the source code, these techniques may not produce good quality results in absence of adequate inputs to work on. In this paper we propose an approach to overcome this limitation. Proposed approach uses a combination of multiple types of features together and applies automated weighing on the extracted features to enhance their information quality and to reduce noise. We define a way to estimate distance between code elements in terms of combination of multiple types of features. Weighted graph partitioning with a multi-objective global modularity criterion is used to select the clusters as architectural components. We describe methods for automated labeling of the extracted components and for generating inter-component interactions. We further discuss how the suggested approach extends to clustering at multiple hierarchical levels, to application portfolios, and even for improving precision for the feature location problem.

Full Text