Semantic clustering: Identifying topics in source code

Adrian Kuhn,Stéphane Ducasse,Tudor Gîrba

doi:10.1016/j.infsof.2006.10.017

Abstract

Many of the existing approaches in Software Comprehension focus on program structure or external documentation. However, by analyzing formal information the informal semantics contained in the vocabulary of source code are overlooked. To understand software as a whole, we need to enrich software analysis with the developer knowledge hidden in the code naming. This paper proposes the use of information retrieval to exploit linguistic information found in source code, such as identifier names and comments. We introduce Semantic Clustering, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary. We call these groups semantic clusters and we interpret them as linguistic topics that reveal the intention of the code. We compare the topics to each other, identify links between them, provide automatically retrieved labels, and use a visualization to illustrate how they are distributed over the system. Our approach is language independent as it works at the level of identifier names. To validate our approach we applied it on several case studies, two of which we present in this paper. Note: Some of the visualizations presented make heavy use of colors. Please obtain a color copy of the article for better understanding.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic clustering: Identifying topics in source code

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology

Lead the way for us

Journal: Information and Software Technology	Publication Date: Jan 4, 2007
Citations: 497

Similar Papers

Supporting program comprehension with program summarization
Yu Liu ... Yun Li
-
Yu Liu, et. al.Yu Liu ... Yun Li
01 Jun 2014
01 Jun 2014

Enriching Reverse Engineering with Semantic Clustering
A Kuhn ... S Ducasse
-
A Kuhn, et. al.A Kuhn ... S Ducasse
07 Nov 2005
07 Nov 2005

On the Effect of Semantically Enriched Context Models on Software Modularization
Amir Saeidi ... Jurriaan Hage
The Art, Science, and Engineering of Programming | VOL. 2
Amir Saeidi, et. al.Amir Saeidi ... Jurriaan Hage
05 Aug 2017
The Art, Science, and Engineering of Programming | VOL. 2

Complementing Software Documentation
Pieter Van Der Spek ... Piërre Van De Laar
-
Pieter Van Der Spek, et. al.Pieter Van Der Spek ... Piërre Van De Laar
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic clustering: Identifying topics in source code

Abstract

Talk to us

Similar Papers

More From: Information and Software Technology