Abstract

An important application of information retrieval technology is software change impact analysis. Existing information retrieval-based change impact analysis methods select a single method to transform the source code corpus into vectors in a process known as indexing. The single method is chosen from two primary methods, known as the bag-of-words and word embedding models, each having their specific advantages and disadvantages. The bag-of-words model records every word in the source code but ignores contextual information in the corpus. The word embedding model records the contextual information but loses detail for individual words. To address this problem, we propose a structure-driven method for information retrieval-based change impact analysis (named SDM-CIA). SDM-CIA integrates the bag-of-words and word embedding models based on the software’s structure. Our experiments using a standard benchmark shows that when compared with the existing methods, SDM-CIA improves on precision performance, recall performance, F-score performance, and MRR performance by an average of 3.65%, 3.82%, 3.6%, and 10.28%, respectively. Our experiments confirm the effectiveness of SDM-CIA.

Highlights

  • Many researchers have conducted further studies, but their works are still built on the three-step process de ned by Marcus et al is paper’s work is built on the basic process de ned by Marcus et al, but we found a better way to index and calculate the similarity; we can achieve better change impact analysis performance

  • 6.21 6.53 4.56 5.40 7.43 precision Dc ∩ Dr ∗ 100%, Dr recall Dc ∩ Dr ∗ 100%, Dc where Dc is the set of all correct source code documents related to the change request, and Dr is the set of source code documents retrieved by the change impact analysis (CIA) method

  • We have presented a structure-driven method for information retrieval-based change impact analysis

Read more

Summary

Introduction

Textual methods make use of pattern matching [23], information retrieval (IR) [7, 8], or natural language processing (NLP) [19]. IR techniques are essentially statistical methods [2]; they need users to submit a query that describes the change request by text. IR techniques transform the source code texts and query text to vectors, calculate the similarities between source code texts and query (change request) in vector space. Users can decide which source code is really related to the query based on the similarities. NLP approaches can exploit a query, but they analyze the parts of speech of the words used in source code [2]

Objectives
Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call