As requirements of organisations change, so do the software systems within them. When changes are carried out under tough deadlines, software developers often do not follow software engineering principles, which results in deteriorated structure of the software. A badly structured system is difficult to understand for further changes. To improve structure, re-modularisation may be carried out. Clustering techniques have been used to facilitate automatic re-modularisation. However, clusters produced by clustering algorithms are difficult to comprehend unless they are labelled appropriately. Manual assignment of labels is tiresome, thus efforts should be made towards automatic cluster label assignment. In this study, the authors focus on facilitating comprehension of software clustering results by automatically assigning meaningful labels to clusters. To assign labels, the authors use term weighting schemes borrowed from the domain of information retrieval and text categorisation. Although some term weighting schemes have been used by researchers for software cluster labelling, there is a need to analyse the term weighting schemes and related issues to identify the strengths and weaknesses of these schemes for software cluster labelling. In this context, the authors analyse the behaviour of seven well-known term weighting schemes. Also, they perform the experiments on five software systems to identify software characteristics which affect the labelling behaviour of the term weighting schemes.
Read full abstract