ISMELL: Assembling LLMs with Expert Toolsets for Code Smell Detection and Refactoring

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Detecting and refactoring code smells is challenging, laborious, and sustaining. Although large language models have demonstrated potential in identifying various types of code smells, they also have limitations such as input-output token restrictions, difficulty in accessing repository-level knowledge, and performing dynamic source code analysis. Existing learning-based methods or commercial expert toolsets have advantages in handling complex smells. They can analyze project structures and contextual information in-depth, access global code repositories, and utilize advanced code analysis techniques. However, these toolsets are often designed for specific types and patterns of code smells and can only address fixed smells, lacking flexibility and scalability. To resolve that problem, we propose iSMELL, an ensemble approach that employs various code smell detection toolsets via Mixture of Experts (MoE) architecture for comprehensive code smell detection, and enhances the LLMs with the detection results from expert toolsets for refactoring those identified code smells. First, we train a MoE model that, based on input code vectors, outputs the most suitable expert tool for identifying each type of smell. Then, we select the recommended toolsets for code smell detection and obtain their results. Finally, we equip the prompts with the detection results from the expert toolsets, thereby enhancing the refactoring capability of LLMs for code with existing smells, enabling them to provide different solutions based on the type of smell. We evaluate our approach on detecting and refactoring three classical and complex code smells, i.e., Refused Bequest, God Class, and Feature Envy. The results show that, by adopting seven expert code smell toolsets, iSMELL achieved an average F1 score of 75.17% on code smell detection, outperforming LLMs baselines by an increase of 35.05% in F1 score. We further evaluate the code refactored by the enhanced LLM. The quantitative and human evaluation results show that iSMELL could improve code quality metrics and conduct satisfactory refactoring toward the identified code smells. We believe that our proposed solution could provide new insights into better leveraging LLMs and existing approaches to resolving complex software tasks.

Similar Papers
  • Conference Article
  • Cite Count Icon 1
  • 10.5753/cibse.2024.28440
An Exploratory Evaluation of Continuous Feedback to Enhance Machine Learning Code Smell Detection
  • May 6, 2024
  • Daniel Cruz + 2 more

Code smells are symptoms of bad design choices implemented on the source code. Several code smell detection tools and strategies have been proposed over the years, including the use of machine learning algorithms. However, we lack empirical evidence on how expert feedback could improve machine learning based detection of code smells. This paper aims to propose and evaluate a conceptual strategy to improve machine-learning detection of code smells by means of continuous feedback. To evaluate the strategy, we follow an exploratory evaluation design to compare results of the smell detection before and after feedback provided by a service - acting as a software expert. We focus on four code smells - God Class, Long Method, Feature Envy, and Refused Bequest - detected in 20 Java systems. As results, we observed that continuous feedback improves the performance of code smell detection. For the detection of the class-level code smells, God Class and Refused Bequest, we achieved an average improvement in terms of F1 of 0.13 and 0.58, respectively, after 50 iterations of feedback. For the method-level code smells, Long Method and Feature Envy, the improvements of F1 were 0.66 and 0.72, respectively.

  • Research Article
  • Cite Count Icon 37
  • 10.1007/s13369-016-2238-8
A Lightweight Approach for Detection of Code Smells
  • Jul 6, 2016
  • Arabian Journal for Science and Engineering
  • Ghulam Rasool + 1 more

The accurate removal of code smells from source code supports activities such as refactoring, maintenance, examining code quality etc. A large number of techniques and tools are presented for the specification and detection of code smells from source code in the last decade, but they still lack accuracy and flexibility due to different interpretations of code smell definitions. Most techniques target just detection of few code smells and render different results on the same examined systems due to different informal definitions and threshold values of metrics used for detecting code smells. We present a flexible and lightweight approach based on multiple searching techniques for the detection and visualization of all 22 code smells from source code of multiple languages. Our approach is lightweight and flexible due to application of SQL queries on intermediate repository and use of regular expressions on selected source code constructs. The concept of approach is validated by performing experiments on eight publicly available open source software projects developed using Java and C# programming languages, and results are compared with existing approaches. The accuracy of presented approach varies from 86–97 % on the eight selected software projects.

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/3422392.3422427
Applying Machine Learning to Customized Smell Detection
  • Oct 21, 2020
  • Daniel Oliveira + 5 more

Code smells are considered symptoms of poor implementation choices, which may hamper the software maintainability. Hence, code smells should be detected as early as possible to avoid software quality degradation. Unfortunately, detecting code smells is not a trivial task. Some preliminary studies investigated and concluded that machine learning (ML) techniques are a promising way to better support smell detection. However, these techniques are hard to be customized to promote an early and accurate detection of specific smell types. Yet, ML techniques usually require numerous code examples to be trained (composing a relevant dataset) in order to achieve satisfactory accuracy. Unfortunately, such a dependency on a large validated dataset is impractical and leads to late detection of code smells. Thus, a prevailing challenge is the early customized detection of code smells taking into account the typical limited training data. In this direction, this paper reports a study in which we collected code smells, from ten active projects, that were actually refactored by developers, differently from studies that rely on code smells inferred by researchers. These smells were used for evaluating the accuracy regarding early detection of code smells by using seven ML techniques. Once we take into account such smells that were considered as important by developers, the ML techniques are able to customize the detection in order to focus on smells observed as relevant in the investigated systems. The results showed that all the analyzed techniques are sensitive to the type of smell and obtained good results for the majority of them, especially JRip and Random Forest. We also observe that the ML techniques did not need a high number of examples to reach their best accuracy results. This finding implies that ML techniques can be successfully used for early detection of smells without depending on the curation of a large dataset.

  • Research Article
  • Cite Count Icon 2
  • 10.14419/ijet.v7i2.27.14635
Design of testing framework for code smell detection (OOPS) using BFO algorithm
  • Aug 6, 2018
  • International Journal of Engineering & Technology
  • Pratiksha Sharma + 1 more

Detection of bad smells refers to any indication in the program code of a execution that perhaps designate a issue, maintain the software and software evolution. Code Smell detection is a main challenging for software developers and their informal classification direct to the designing of various smell detection methods and software tools. It appraises 4 code smell detection tool in software like as a in Fusion, JDeodorant, PMD and Jspirit. In this research proposes a method for detection the bad code smells in software is called as code smell. Bad smell detection in software, OOSMs are used to identify the Source Code whereby Plug-in were implemented for code detection in which position of program initial code the bad smell appeared so that software refactoring can then acquire position. Classified the code smell, as a type of codes: long method, PIH, LPL, LC, SS and GOD class etc. Detection of the code smell and as a result applying the correct detection phases when require is significant to enhance the Quality of the code or program. The various tool has been proposed for detection of the code smell each one featured by particular properties. The main objective of this research work described our proposed method on using various tools for code smell detection. We find the major differences between them and dissimilar consequences we attained. The major drawback of current research work is that it focuses on one particular language which makes them restricted to one kind of programs only. These tools fail to detect the smelly code if any kind of change in environment is encountered. The base paper compares the most popular code smell detection tools on basis of various factors like accuracy, False Positive Rate etc. which gives a clear picture of functionality these tools possess. In this paper, a unique technique is designed to identify CSs. For this purpose, various object-oriented programming (OOPs)-based-metrics with their maintainability index are used. Further, code refactoring and optimization technique are applied to obtain low maintainability Index. Finally, the proposed scheme is evaluated to achieve satisfactory results. The results of the BFOA test defined that the lazy class caused framework defects in DLS, DR, and SE. However, the LPL caused no framework defects what so ever. The consequences of the connection rules test searched that the LCCS (Lazy Class Code Smell) caused structured defects in DE and DLS, which corresponded to the consequences of the BFOA test. In this research work, a proposed method is designed to verify the code smell. For this purpose, different OOPs based Software Metrics with their MI (Maintainability Index) are utilized. Further Code refactoring and optimization method id applied to attained the less maintainability index and evaluated to achieved satisfactory results.

  • Research Article
  • Cite Count Icon 7
  • 10.3390/app13158770
Integrating Interactive Detection of Code Smells into Scrum: Feasibility, Benefits, and Challenges
  • Jul 29, 2023
  • Applied Sciences
  • Danyllo Albuquerque + 4 more

(Context) Code smells indicate poor coding practices or design flaws, suggesting deeper software quality issues. While addressing code smells promptly improves software quality, traditional detection techniques often fail in continuous detection during software development. (Problem Statement) More recently, Interactive Detection (ID) technique has been proposed, enabling the detection of code smells continuously. Although the use of this technique by developers and organizations is promising, there are no practical recommendations for its use in the context of software development. (Goal) The objective of this study was to propose and evaluate the integration of ID into the widely adopted Scrum framework for agile software development. (Method) To achieve this objective, we utilized a mixed-method approach that combined a comprehensive literature review and expert knowledge to propose the integration. Furthermore, we conducted a focus group and a controlled experiment involving software development activities to evaluate this integration. (Results) The findings revealed that this integration significantly benefitted software development, such as early detection of code smells, increased effectiveness in code smell detection, and improved code quality. These findings shed light on the potential benefits of adopting this integration, offering valuable insights for developers and researchers. (Conclusions) This research emphasized the importance of continuous code smell detection as an integral part of agile development and opened avenues for further research in code quality management within agile methodologies.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/sisy56759.2022.10036248
Semi-supervised detection of Long Method and God Class code smells
  • Sep 15, 2022
  • Ilija Brdar + 4 more

Code smells are poorly designed parts of code whose removal is essential for sustainable software development. However, recognizing code smells in practice is challenging. Machine Learning (ML)-based code smell detectors could solve this problem. Current ML-based code smell detection approaches are based on supervised learning (SL) that requires a large and diverse dataset for training. Unfortunately, the existing code smell datasets are small, which hinders the performance of the trained SL models. This paper aims to improve the performance of ML-based code smell detectors by employing semi-supervised learning (SSL). SSL models are trained by combining a manually labeled code smell dataset with unlabeled code snippets collected from open-source repositories. Two major SSL techniques are employed: self-training and co-training. Experiments were performed for two code smell types: God Class and Long Method. SSL classifiers significantly outperformed SL classifiers for God Class detection (by 6% F-measure). For Long Method detection, SSL classifiers slightly outperformed SL classifiers (by 1%F-measure). This paper is the first to consider applying SSL for code smell detection. SSL models outperforming SL models in all experiments suggest that SSL holds the great potential to improve current code smell detectors, which is essential for their adoption in practice.

  • Research Article
  • Cite Count Icon 76
  • 10.1016/j.eswa.2022.117607
Automatic detection of Long Method and God Class code smells through neural source code embeddings
  • May 19, 2022
  • Expert Systems with Applications
  • Aleksandar Kovačević + 6 more

Code smells are structures in code that often harm its quality. Manually detecting code smells is challenging, so researchers proposed many automatic detectors. Traditional code smell detectors employ metric-based heuristics, but researchers have recently adopted a Machine-Learning (ML) based approach. This paper compares the performance of multiple ML-based code smell detection models against multiple metric-based heuristics for detection of God Class and Long Method code smells. We assess the effectiveness of different source code representations for ML: we evaluate the effectiveness of traditionally used code metrics against code embeddings (code2vec, code2seq, and CuBERT). This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. This approach helped us leverage the power of transfer learning – our study is the first to explore whether the knowledge mined from code understanding models can be transferred to code smell detection. A secondary contribution of our research is the systematic evaluation of the effectiveness of code smell detection approaches on the same large-scale, manually labeled MLCQ dataset. Almost every study that proposes a detection approach tests this approach on the dataset unique for the study. Consequently, we cannot directly compare the reported performances to derive the best-performing approach.

  • Research Article
  • Cite Count Icon 291
  • 10.1016/j.infsof.2018.12.009
Machine learning techniques for code smell detection: A systematic literature review and meta-analysis
  • Jan 5, 2019
  • Information and Software Technology
  • Muhammad Ilyas Azeem + 3 more

Machine learning techniques for code smell detection: A systematic literature review and meta-analysis

  • Conference Article
  • Cite Count Icon 6
  • 10.23919/softcom55329.2022.9911317
Empirical Assessment on Interactive Detection of Code Smells
  • Sep 22, 2022
  • Danyllo Albuquerque + 5 more

Code smell detection is traditionally supported by Non-Interactive Detection (NID) techniques, which enable devel-opers to reveal smells in later software versions. These techniques only reveal smells in the source code upon an explicit developer request and do not support progressive interaction with affect code. The later code smells are detected, the higher the effort to refactor the affected code. The notion of Interactive Detection (ID) has emerged to address NID's limitations. An ID technique reveals code smell instances without an explicit developer request, encouraging early detection of code smells. Even though ID seems promising, there is a lack of evidence concerning its impact on code smell detection. Our research focused on evaluating the effectiveness of the ID technique on code smell detection. For doing so, we conducted a controlled experiment where 16 subjects underwent experimental tasks. We concluded that using the ID technique led to an increase of 60% in recall and up to 13% in precision when detecting code smells. Consequently, developers could identify more refactoring opportunities using the ID technique than the NID.

  • Research Article
  • Cite Count Icon 6
  • 10.1155/2023/2973250
Intelligent Mining of Association Rules Based on Nanopatterns for Code Smells Detection
  • Apr 13, 2023
  • Scientific Programming
  • D Juliet Thessalonica + 3 more

Software maintenance is an imperative step in software development. Code smells can arise as a result of poor design as well as frequent code changes due to changing needs. Early detection of code smells during software development can help with software maintenance. This work focuses on identifying code smells on Java software using nanopatterns. Nanopatterns are method-level code structures that reflect the presence of code smells. Nanopatterns are extracted using a command-line interface based on the ASM bytecode analysis. Class labels are extracted using three tools, namely inFusion, JDeodorant, and iPlasma. Rules are extracted from nanopatterns using the Apriori algorithm and mapped with the extracted class labels. Best rules are selected using the Border Collie Optimization (BCO) algorithm with the accuracy of the k-NN classifier as the fitness function. The selected rules are stored in the rule base to detect code smells. The objective is to detect a maximum number of code smells with a minimum number of rules. Experiments are carried out on Java software, namely jEdit, Nutch, Lucene, and Rhino. The proposed work detects code smells, namely data class, blob, spaghetti code, functional decomposition, and feature envy, with 98.78% accuracy for jEdit, 97.45% for Nutch, 95.58% for Lucene, and 96.34% for Rhino. The performance of the proposed work is competitive with other well-known methods of detecting code smells.

  • Conference Article
  • Cite Count Icon 58
  • 10.1145/3180155.3182530
The scent of a smell
  • May 27, 2018
  • Fabio Palomba + 4 more

Code smells, i.e., symptoms of poor design and implementation choices applied by programmers during the development of a software project [2], represent an important factor contributing to technical debt [3]. The research community spent a lot of effort studying the extent to which code smells tend to remain in a software project for long periods of time [9], as well as their negative impact on non-functional properties of source code [4, 7]. As a consequence, several tools and techniques have been proposed to help developers in detecting code smells and to suggest refactoring opportunities (e.g., [5, 6, 8]). So far, almost all detectors identify code smells using structural properties of source code. However, recent studies have indicated that code smells detected by existing tools are generally ignored (and thus not refactored) by the developers [1]. A possible reason is that developers do not perceive the code smells identified by the tool as actual design problems or, if they do, they are not able to practically work on such code smells. In other words, there is misalignment between what is considered smelly by the tool and what is actually refactorable by developers. In a previous paper [6], we introduced a tool named TACO that uses textual analysis to detect code smells. The results indicated that textual and structural techniques are complementary: while some code smell instances in a software system can be correctly identified by both TACO and the alternative structural approaches, other instances can be only detected by one of the two [6]. In this paper, we investigate whether code smells detected using textual information are as difficult to identify and refactor as structural smells or if they follow a different pattern during software evolution. We firstly performed a repository mining study considering 301 releases and 183,514 commits from 20 open source projects (i) to verify whether textually and structurally detected code smells are treated differently, and (ii) to analyze their likelihood of being resolved with regards to different types of code changes, e.g., refactoring operations. Since our quantitative study cannot explain relation and causation between code smell types and maintenance activities, we perform a qualitative study with 19 industrial developers and 5 software quality experts in order to understand (i) how code smells identified using different sources of information are perceived, and (ii) whether textually or structurally detected code smells are easier to refactor. In both studies, we focused on five code smell types, i.e., Blob, Feature Envy, Long Method, Misplaced Class, and Promiscuous Package. The results of our studies indicate that textually detected code smells are perceived as harmful as the structural ones, even though they do not exceed any typical software metrics' value (e.g., lines of code in a method). Moreover, design problems in source code affected by textual-based code smells are easier to identify and refactor. As a consequence, developers' activities tend to decrease the intensity of textual code smells, positively impacting their likelihood of being resolved. Vice versa, structural code smells typically increase in intensity over time, indicating that maintenance operations are not aimed at removing or limiting them. Indeed, while developers perceive source code affected by structural-based code smells as harmful, they face more problems in correctly identifying the actual design problems affecting these code components and/or the right refactoring operation to apply to remove them.

  • Research Article
  • Cite Count Icon 26
  • 10.7717/peerj-cs.1370
Python code smells detection using conventional machine learning models.
  • May 29, 2023
  • PeerJ Computer Science
  • Rana Sandouka + 1 more

Code smells are poor code design or implementation that affect the code maintenance process and reduce the software quality. Therefore, code smell detection is important in software building. Recent studies utilized machine learning algorithms for code smell detection. However, most of these studies focused on code smell detection using Java programming language code smell datasets. This article proposes a Python code smell dataset for Large Class and Long Method code smells. The built dataset contains 1,000 samples for each code smell, with 18 features extracted from the source code. Furthermore, we investigated the detection performance of six machine learning models as baselines in Python code smells detection. The baselines were evaluated based on Accuracy and Matthews correlation coefficient (MCC) measures. Results indicate the superiority of Random Forest ensemble in Python Large Class code smell detection by achieving the highest detection performance of 0.77 MCC rate, while decision tree was the best performing model in Python Long Method code smell detection by achieving the highest MCC Rate of 0.89.

  • Research Article
  • Cite Count Icon 18
  • 10.1007/s10664-021-10110-5
Crowdsmelling: A preliminary study on using collective knowledge in code smells detection
  • Mar 17, 2022
  • Empirical Software Engineering
  • José Pereira Dos Reis + 2 more

Code smells are seen as major source of technical debt and, as such, should be detected and removed. However, researchers argue that the subjectiveness of the code smells detection process is a major hindrance to mitigate the problem of smells-infected code. We proposed the crowdsmelling approach based on supervised machine learning techniques, where the wisdom of the crowd (of software developers) is used to collectively calibrate code smells detection algorithms, thereby lessening the subjectivity issue. This paper presents the results of a validation experiment for the crowdsmelling approach. In the context of three consecutive years of a Software Engineering course, a total "crowd" of around a hundred teams, with an average of three members each, classified the presence of 3 code smells (Long Method, God Class, and Feature Envy) in Java source code. These classifications were the basis of the oracles used for training six machine learning algorithms. Over one hundred models were generated and evaluated to determine which machine learning algorithms had the best performance in detecting each of the aforementioned code smells. Good performances were obtained for God Class detection (ROC=0.896 for Naive Bayes) and Long Method detection (ROC=0.870 for AdaBoostM1), but much lower for Feature Envy (ROC=0.570 for Random Forrest). Obtained results suggest that crowdsmelling is a feasible approach for the detection of code smells, but further validation experiments are required to cover more code smells and to increase external validity.

  • Conference Article
  • Cite Count Icon 9
  • 10.23919/cisti.2017.7975961
Code smells detection 2.0: Crowdsmelling and visualization
  • Jun 1, 2017
  • Jose Pereira Dos Reis + 2 more

Background: Code smells have long been catalogued with corresponding mitigating solutions called refactoring operations. However, while the latter are supported in several IDEs, code smells detection scaffolding still has many limitations. Another aspect deserving attention is code smells visualization, to increase software quality awareness, namely in large projects, where maintainability is often the dominating issue. Research problems: Researchers have pointed out that code smells detection is inherently a subjective process and that is probably the main hindrance on providing automatic support. Regarding visualization, customized views are required, because each code smell type may have a different scope. Choosing the right visualization for each code smell type is an open research topic. Expected contributions: This research work focuses on the code smells detection and awareness process, by proposing two symbiotic contributions: crowdsmelling and smelly maps. We envisage that such features will be available in a future generation of interactive development environments (aka IDE 2.0). Crowdsmelling uses the concept of collective intelligence through which programmers around the world will collaboratively contribute to the calibration of code smells detection algorithms (one per each code smell), hopefully improving the detection accuracy and mitigating the subjectivity problem. Smelly maps build upon the aforementioned code smells detection capability and on the previous experience at UNIFACS of setting up a software visualization infrastructure. We expect to represent detected code smells at different abstraction levels with the goal of increasing software quality awareness and facilitating refactoring decisions upon large software systems.

  • Research Article
  • Cite Count Icon 20
  • 10.3390/app14146149
Machine Learning-Based Methods for Code Smell Detection: A Survey
  • Jul 15, 2024
  • Applied Sciences
  • Pravin Singh Yadav + 3 more

Code smells are early warning signs of potential issues in software quality. Various techniques are used in code smell detection, including the Bayesian approach, rule-based automatic antipattern detection, antipattern identification utilizing B-splines, Support Vector Machine direct, SMURF (Support Vector Machines for design smell detection using relevant feedback), and immune-based detection strategy. Machine learning (ML) has taken a great stride in this area. This study includes relevant studies applying ML algorithms from 2005 to 2024 in a comprehensive manner for the survey to provide insight regarding code smell, ML algorithms frequently applied, and software metrics. Forty-two pertinent studies allow us to assess the efficacy of ML algorithms on selected datasets. After evaluating various studies based on open-source and project datasets, this study evaluated additional threats and obstacles to code smell detection, such as the lack of standardized code smell definitions, the difficulty of feature selection, and the challenges of handling large-scale datasets. The current studies only considered a few factors in identifying code smells, while in this study, several potential contributing factors to code smells are included. Several ML algorithms are examined, and various approaches, datasets, dataset languages, and software metrics are presented. This study provides the potential of ML algorithms to produce better results and fills a gap in the body of knowledge by providing class-wise distributions of the ML algorithms. Support Vector Machine, J48, Naive Bayes, and Random Forest models are the most common for detecting code smells. Researchers can find this study helpful in better anticipating and taking care of software development design and implementation issues. The findings from this study, which highlight the practical implications of ML algorithms in software quality improvement, will help software engineers fix problems during software design and development to ensure software quality.

Save Icon
Up Arrow
Open/Close