Abstract
Technical Debt (TD) is a metaphor that refers to short-term solutions in software development that may affect the software development life cycle cost. Researchers have found many TD types. These TD types include but are not limited to code debt (CD), design debt (DD), and architecture technical debt (ATD). Several methods have been used to detect technical debt, such as bad smells, software metrics, and code comments. Although TD has received many researchers’ attention, ATD has received less attention compared with CD and DD. We found a lack of tools to deal with ATD in contrast to CD and DD. Avoiding TD altogether is impossible but identifying its risk can help to reduce the impact of the technical debt. However, tracking and assessing the ATD’s risk level with the aim of making the refactoring decisions has not been studied in the open literature. In this dissertation, we systematically study TD and apply multiple case studies to find the research gaps and study technical debt. We survey practicing software engineers to inspect the likelihood, the impact of ATD, and the refactoring benefits and challenges. We will propose a methodology to identify the ATD risk level on software components and apply machine learning techniques to find the ATD risk level. This thesis’s main contribution is a novel methodology for assessing the ATD risk level based on architecture smells and software metrics. The proposed methodology based on an empirical approach is used to estimate the ATD risk level by tracking 5,179 architecture smell instances identified in 40 C# project releases collected from the GitHub repository. The method is validated using a dataset that contains 45 apache java projects and 3,480 packages. We compared our results with related works and assessed our results’ accuracy compared to related works results. First, we compared the ATD risk classification with the Quality Depreciation Index Rule (QDIR); the average classification accuracy was 77% (80% with the Critical Severity level). Next, the ATD risk levels were compared with Refactoring Effort levels; the average classification accuracy was 88% and 81% for 3 and 5 levels, respectively. The ATD risk levels were compared with the architecture smell levels; the average classification accuracy was 90% and 89% for 3 and 5 levels, respectively. In addition, we used the Wilcoxon rank-sum test (α= 0.01) to verify whether the proposed method results are statistically different or not.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have