Abstract

SUMMARYOne of the most successful applications of textual analysis in software engineering is the use of information retrieval (IR) methods to reconstruct traceability links between software artifacts. Unfortunately, because of the limitations of both the humans developing artifacts and the IR techniques any IR‐based traceability recovery method fails to retrieve some of the correct links, while on the other hand it also retrieves links that are not correct. This limitation has posed challenges for researchers that have proposed several methods to improve the accuracy of IR‐based traceability recovery methods by removing the ‘noise’ in the textual content of software artifacts (e.g., by removing common words or increasing the importance of critical terms). In this paper, we propose a heuristic to remove the ‘noise’ taking into account the linguistic nature of words in the software artifacts. In particular, the language used in software documents can be classified as a technical language, where the words that provide more indication on the semantics of a document are the nouns. The results of a case study conducted on five software artifact repositories indicate that characterizing the context of software artifacts considering only nouns significantly improves the accuracy of IR‐based traceability recovery methods. Copyright © 2012 John Wiley & Sons, Ltd.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.