Articles published on Search-based software engineering
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
146 Search results
Sort by Recency
- Research Article
- 10.14201/adcaij.32600
- Nov 17, 2025
- ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal
- Caetano Segundo + 2 more
Several tasks must be performed for a software development project to be completed successfully. Task allocation is a complex task that significantly impacts the project's success. Techniques have been proposed over the years to support this step, aiming to minimize cost and development time and to reduce the negative impact of team members leaving the project. In this context, the Truck Factor (TF) is a metric that can determine the risk to a project and can be used when distributing tasks among team members. The TF concerns the distribution of knowledge about the project among the development team members, ensuring that knowledge is not concentrated in only one part of the team. This theme is relevant nowadays since team rotation has become frequent due to the increasing demand for software in recent years. Member allocation in software teams does not have an exact solution since it is an NP-hard problem. Thus, Search-Based Software Engineering (SBSE) techniques, which can apply optimization algorithms such as genetic algorithms, have been used in several types of research to solve this class of problem over the years. In multi-agent environments, a certain number of agents have perception and communicate to achieve their goals. Researchers in the multi-agent field use simulated environments to validate their research since the modeling and simulation must consider the main variables of a real environment. Therefore, this work proposes to build a multi-agent simulation using SBSE techniques to minimize the impacts caused by the Truck Factor in a software development team. In the simulations, we model different configurations for the software development teams. Through statistical analysis and hypothesis testing, our results show that the proposed approach minimizes the impacts caused by task allocation in software development teams by around 25 % when considering the TF metric during task allocation.
- Research Article
- 10.1007/s10664-025-10733-y
- Nov 4, 2025
- Empirical Software Engineering
- Amid Golmohammadi + 2 more
In this article, we explore the impact of tool development and its evolution in Search-Based Software Engineering (SBSE) research. As a research tool evolves throughout the years, experiments with novel techniques might require reevaluation of previous studies, especially regarding parameter tuning. These reevaluations also give the opportunity to address the threats to external validity of these previous studies by employing a larger selection of artifacts. To conduct the replicated experiments in this study, the search-based fuzzer EvoMaster is chosen. This SBSE tool has been developed and extended throughout several years (since 2016) and tens of scientific studies. Among the chosen tool’s parameters, 6 were carefully selected based on 5 previous studies that we replicate in this article with the latest version of EvoMaster. The replication is applied across an expanded set of artifacts compared to the original replicated studies. Our objective is to validate the robustness and validity of previous findings and to determine the need for parameter tuning in response to the tool’s continuous development. Beyond replication, we explored parameter tuning by testing 729 different configurations to find a more performant parameter set, which is later validated through additional rounds of experiments. Additionally, we analyzed the impact of individual parameters on test generation performance using machine learning models, providing insights into their relative effects. Our findings indicate that, although most parameters maintain their efficacy, 2 of them require adjustment. Furthermore, the investigation into the effects of combining different parameter values reveals that carefully optimized configurations can outperform default settings. These findings highlight the importance of regularly reevaluating parameter settings to enhance tool performance in SBSE research.
- Research Article
- 10.5753/jserd.2025.3906
- Mar 6, 2025
- Journal of Software Engineering Research and Development
- Italo Yeltsin + 5 more
Search-Based Software Engineering (SBSE) aims to transform Software Engineering (SE) problems into search problems by defining a fitness function that guides the search for an optimal or sub-optimal solution. However, designing a fitness function that provides equitable relevance (or weight) to every SE metric associated with an SBSE problem is an assumption that may be challenging. This issue derives from the several properties related to SE metric value domains that can induce the search process to privilege specific metrics over others, misleading to suboptimal outcomes. To deal with this problem, this work proposes a mathematical model based on the scalarization function concept to better control each metric’s relevance in the search process. Our empirical study comprises two computational experiments. The first experiment aimed to evaluate the proposed scalarizing-based approach’s control capability over the SE metrics in a scenario where all metrics should have the same relevance, while the second experiment covers the scenario where metrics do not necessarily should have the same relevance. The results demonstrate the importance of properly considering the impact, nature, and value range of SE metrics in the search process and the effectiveness of the proposed model in controlling SE metric relevance in different scenarios. This research makes three significant contributions. Firstly, we empirically highlight the importance of properly considering the relevance of individual SE metrics in the search process. Secondly, we propose a generic mathematical model based on scalarizing functions to cope with the normalization process and can be applied to a wide range of SBSE problems. Finally, we show that the our scalarizing approach is capable of guiding search-based process not only in the scenario where all metrics relevance must be equal, but also in the variation of the relevance alongside the optimization process, which is quite important for the design of fitness functions in SBSE.
- Research Article
- 10.1145/3715002
- Jan 27, 2025
- ACM Transactions on Software Engineering and Methodology
- Daniel Blasco + 4 more
Phylogenetics studies the relationships, in terms of biological history and kinship, of a set of taxa (e.g., species). We argue that in Search-based Software Engineering (SBSE), the individuals of an evolutionary computation-driven population could be considered as taxa for which the leverage of Phylogenetic Inference might be beneficial. In this work, we present our Phylogenetics-aware SBSE approach. Our approach introduces a novel Phylogenetic Operation to promote results which are sufficiently aligned (in terms of lineage) with a certain reference given by the domain expert. Our approach is evaluated in two heterogeneous industrial case studies: Procedural Content Generation from Game Software Engineering, and Feature Location from Software Maintenance. The results are analyzed using quality-of-the-solution and acceptance-by-developers measurements. We performed a statistical analysis to determine whether the impact on the results is significant compared to baselines that do not leverage Phylogenetics. The results show that our approach significantly outperforms two baselines in both case studies. Furthermore, two focus groups confirmed the acceptance of our approach and stressed that solution acceptance may make the difference in industrial environments. Our work has the potential to motivate a new breed of research work on Phylogenetics awareness to produce better results in Software Engineering.
- Research Article
2
- 10.1007/s10515-024-00473-6
- Jan 21, 2025
- Automated Software Engineering
- Alexander E I Brownlee + 7 more
Ever since the first large language models (LLMs) have become available, both academics and practitioners have used them to aid software engineering tasks. However, little research as yet has been done in combining search-based software engineering (SBSE) and LLMs. In this paper, we evaluate the use of LLMs as mutation operators for genetic improvement (GI), an SBSE approach, to improve the GI search process. In a preliminary work, we explored the feasibility of combining the Gin Java GI toolkit with OpenAI LLMs in order to generate an edit for the JCodec tool. Here we extend this investigation involving three LLMs and three types of prompt, and five real-world software projects. We sample the edits at random, as well as using local search. We also conducted a qualitative analysis to understand why LLM-generated code edits break as part of our evaluation. Our results show that, compared with conventional statement GI edits, LLMs produce fewer unique edits, but these compile and pass tests more often, with the OpenAI model finding test-passing edits 77% of the time. The OpenAI and Mistral LLMs are roughly equal in finding the best run-time improvements. Simpler prompts are more successful than those providing more context and examples. The qualitative analysis reveals a wide variety of areas where LLMs typically fail to produce valid edits commonly including inconsistent formatting, generating non-Java syntax, or refusing to provide a solution.
- Research Article
- 10.1007/s40747-024-01706-7
- Jan 15, 2025
- Complex & Intelligent Systems
- Yinghan Hong + 5 more
Automated test case generation for path coverage (ATCG-PC) is a major challenge in search-based software engineering due to its complexity as a large-scale black-box optimization problem. However, existing search-based approaches often fail to achieve high path coverage in large-scale unit programs. This is due to their expansive decision space and the presence of hundreds of feasible paths. In this paper, we present a microscale (small-size subsets of the decomposed decision set) search-based algorithm with time-space transfer (MISA-TST). This algorithm aims to identify more accurate subspaces consisting of optimal solutions based on two strategies. The dimension partition strategy employs a relationship matrix to track subspaces corresponding to the target paths. Additionally, the specific value strategy allows MISA-TST to focus the search on the neighborhood of specific dimension values rather than the entire dimension space. Experiments conducted on nine normal-scale and six large-scale benchmarks demonstrate the effectiveness of MISA-TST. The large-scale unit programs encompass hundreds of feasible paths or more than 1.00E+50 test cases. The results show that MISA-TST achieves significantly higher path coverage than other state-of-the-art algorithms in most benchmarks. Furthermore, the combination of the two time-space transfer strategies significantly enhances the performance of search-based algorithms like MISA, especially in large-scale unit programs.
- Research Article
- 10.5753/jserd.2024.3638
- Oct 31, 2024
- Journal of Software Engineering Research and Development
- Heleno De S Campos Junior + 4 more
Software developers often need to combine their contributions. This operation is called merge. When the contributions happen at the same physical region in the source code, the merge is marked as conflicting and must be manually resolved by the developers. Existing studies explore why conflicts happen, their characteristics, and how they are resolved. This paper investigates a subset of merge conflicts, which may be resolved using a combination of existing lines. We analyze 10,177 conflict chunks of popular projects that were resolved by combining existing lines, aiming at characterizing and finding patterns frequently addressed by developers to resolve them. We found that these conflicting chunks and their resolutions are usually small (they have a median of 6 LOC and 3 LOC, respectively). Moreover, 98.6% of the analyzed resolutions preserve the order of the lines in the conflicting chunks. We also found that 72.7% of the chunk resolutions do not interleave lines from different contributions more than once. Finally, developers prefer to resolve conflicts containing only Import statements using lines from the local version of the conflict. When used as heuristics for automatic merge resolution, these findings could reduce the search space by 94.7%, paving the road for future search-based software engineering tools for this problem.
- Research Article
2
- 10.3390/app13159010
- Aug 6, 2023
- Applied Sciences
- Muhammad Abid Jamil + 5 more
Currently, software development is more associated with families of configurable software than the single implementation of a product. Due to the numerous possible combinations in a software product line, testing these families of software product lines (SPLs) is a difficult undertaking. Moreover, the presence of optional features makes the testing of SPLs impractical. Several features are presented in SPLs, but due to the environment’s time and financial constraints, these features are rendered unfeasible. Thus, testing subsets of configured products is one approach to solving this issue. To reduce the testing effort and obtain better results, alternative methods for testing SPLs are required, such as the combinatorial interaction testing (CIT) technique. Unfortunately, the CIT method produces unscalable solutions for large SPLs with excessive constraints. The CIT method costs more because of feature combinations. The optimization of the various conflicting testing objectives, such as reducing the cost and configuration number, should also be considered. In this article, we proposed a search-based software engineering solution using multi-objective evolutionary algorithms (MOEAs). In particular, the research was applied to different types of MOEA method: the Indicator-Based Evolutionary Algorithm (IBEA), Multi-objective Evolutionary Algorithm based on Decomposition (MOEA/D), Non-dominant Sorting Genetic Algorithm II (NSGAII), NSGAIII, and Strength Pareto Evolutionary Algorithm 2 (SPEA2). The results of the algorithms were examined in the context of distinct objectives and two quality indicators. The results revealed how the feature model attributes, implementation context, and number of objectives affected the performances of the algorithms.
- Research Article
7
- 10.3390/info14030166
- Mar 6, 2023
- Information
- Manikandan Rajagopal + 3 more
Various software engineering paradigms and real-time projects have proved that software testing is the most critical and highly important phase in the SDLC. In general, software testing takes approximately 40–60% of the total effort and time involved in project development. Generating test cases is the most important process in software testing. There are many techniques involved in the automatic generation of these test cases which aim to find a smaller group of cases that could allow for an adequacy level to be achieved which will hence reduce the effort and cost involved in software testing. In the structural testing of a product, the auto-generation of test cases that are path focused in an efficient manner is a challenging process. These are often considered optimization problems and hence search-based methods such as genetic algorithm (GA) and swarm optimizations have been proposed to handle this issue. The significance of the study is to address the optimization problem of automatic test case generation in search-based software engineering. The proposed methodology aims to close the gap of genetic algorithms acquiring local minimum due to poor diversity. Here, dynamic adjustment of cross-over and mutation rate is achieved by calculating the individual measure of similarity and fitness and searching for the more global optimum. The proposed method is applied and experimented on a benchmark of five industrial projects. The results of the experiments have confirmed the efficiency of generating test cases that have optimum path coverage.
- Research Article
32
- 10.1145/3514233
- Jan 31, 2023
- ACM Transactions on Software Engineering and Methodology
- Tao Chen + 1 more
In presence of multiple objectives to be optimized in Search-Based Software Engineering (SBSE), Pareto search has been commonly adopted. It searches for a good approximation of the problem’s Pareto-optimal solutions, from which the stakeholders choose the most preferred solution according to their preferences. However, when clear preferences of the stakeholders (e.g., a set of weights that reflect relative importance between objectives) are available prior to the search, weighted search is believed to be the first choice, since it simplifies the search via converting the original multi-objective problem into a single-objective one and enables the search to focus on what only the stakeholders are interested in. This article questions such a “ weighted search first ” belief. We show that the weights can, in fact, be harmful to the search process even in the presence of clear preferences. Specifically, we conduct a large-scale empirical study that consists of 38 systems/projects from three representative SBSE problems, together with two types of search budget and nine sets of weights, leading to 604 cases of comparisons. Our key finding is that weighted search reaches a certain level of solution quality by consuming relatively less resources at the early stage of the search; however, Pareto search is significantly better than its weighted counterpart the majority of the time (up to 77% of the cases), as long as we allow a sufficient, but not unrealistic search budget. This is a beneficial result, as it discovers a potentially new “rule-of-thumb” for the SBSE community: Even when clear preferences are available, it is recommended to always consider Pareto search by default for multi-objective SBSE problems, provided that solution quality is more important. Weighted search, in contrast, should only be preferred when the resource/search budget is limited, especially for expensive SBSE problems. This, together with other findings and actionable suggestions in the article, allows us to codify pragmatic and comprehensive guidance on choosing weighted and Pareto search for SBSE under the circumstance that clear preferences are available. All code and data can be accessed at https://github.com/ideas-labo/pareto-vs-weight-for-sbse .
- Research Article
- 10.2298/csis220830036c
- Jan 1, 2023
- Computer Science and Information Systems
- Yiji Chen + 2 more
Search-Based Software Engineering (SBSE) is one of the techniques used for software defect prediction (SDP), in which search-based optimization algorithms are used to identify the optimal solution to construct a prediction model. As we know, the ranking methods of SBSE are used to solve insufficient sample problems, and the feature selection approaches of SBSE are employed to enhance the prediction model?s performance with curse-of-dimensionality or class imbalance problems. However, it is ignored that there may be a complex problem in the process of building prediction models consisting of the above problems. To address the complex problem, two multi-objective learning-to-rank methods are proposed, which are used to search for the optimal linear classifier model and reduce redundant and irrelevant features. To evaluate the performance of the proposed methods, excessive experiments have been conducted on 11 software programs selected from the NASA repository and AEEEM repository. Friedman?s rank test results show that the proposed method using NSGA-II outperforms other state-of-the-art singleobjective methods for software defect prediction.
- Research Article
45
- 10.1109/tcyb.2021.3089633
- Jan 1, 2023
- IEEE Transactions on Cybernetics
- Yi Xiang + 3 more
Constrained multiobjective optimization problems widely exist in real-world applications. To handle them, the balance between constraints and objectives is crucial, but remains challenging due to non-negligible impacts of problem types. In our context, the problem types refer particularly to those determined by the relationship between the constrained Pareto-optimal front (PF) and the unconstrained PF. Unfortunately, there has been little awareness on how to achieve this balance when faced with different types of problems. In this article, we propose a new constraint handling technique (CHT) by taking into account potential problem types. Specifically, inspired by the prior work, problems are classified into three primary types: 1) I; 2) II; and 3) III, with the constrained PF being made up of the entire, part and none of the unconstrained counterpart, respectively. Clearly, any problem must be one of the three types. For each possible type, there exists a tailored mechanism being used to handle the relationships between constraints and objectives (i.e., constraint priority, objective priority, or the switch between them). It is worth mentioning that exact problem types are not required because we just consider their possibilities in the new CHT. Conceptually, we show that the new CHT can make a tradeoff among different types of problems. This argument is confirmed by experimental studies performed on 38 benchmark problems, whose types are known, and a real-world problem (with unknown types) in search-based software engineering. Results demonstrate that within both decomposition-based and nondecomposition-based frameworks, the new CHT can indeed achieve a good tradeoff among different problem types, being better than several state-of-the-art CHTs.
- Research Article
- 10.1016/j.procs.2023.10.256
- Jan 1, 2023
- Procedia Computer Science
- Gabriela Czibula + 3 more
An unsupervised learning-based methodology for uncovering behavioural patterns for specific types of software defects
- Research Article
8
- 10.1109/tse.2021.3121253
- Nov 1, 2022
- IEEE Transactions on Software Engineering
- Francisca Perez + 3 more
In Search-Based Software Engineering, more than 100 works have involved the human in the search process to obtain better results. However, the case where the human completely replaces the fitness function remains neglected. There is a good reason for that; no matter how intelligent the human is, humans cannot assess millions of candidate solutions as heuristics do. In this work, we study the influence of using the Human as the Fitness Function (HaFF) on the quality of the results. To do that, we focus on Search-Based Model-Driven Engineering (SBMDE) because inspecting models should require less human effort than inspecting code thanks to the abstraction of models. Therefore, we analyze the impact of HaFF in a real-world industrial case study of feature location in models. Furthermore, we also consider a reformulation operation (replacement) in the evaluation because a recent work reported that this operation significantly reduces the number of iterations required in comparison to the widespread crossover and mutation operations. The combination of HaFF and the reformulation operation (HaFF_R) improves the results of the best baseline by 0.15% in recall and 14.26% in precision. Analyzing the results, we learned how to better leverage HaFF_R, which increased the improvement with regard to the best baseline to 1.15% in recall and 20.05% in precision. HaFF_R significantly improves precision because humans are immune to the main limitations of the baselines: vocabulary mismatch and tacit knowledge. A focus group confirmed the acceptance of HaFF. These results are relevant for SBMDE because feature location is one of the main activities performed during maintenance and evolution. Our results, and what we learned from them, can also motivate and help other researchers to explore the benefits of HaFF. In fact, we provide a guideline that further discusses how to apply HaFF to other software engineering problems.
- Research Article
38
- 10.1109/tsusc.2022.3160491
- Oct 1, 2022
- IEEE Transactions on Sustainable Computing
- Akram Alofi + 3 more
Blockchain technology has gained recognition in industrial, financial, and various technological domains for its potential in decentralizing trust in peer-to-peer systems. A core component of blockchain technology is a consensus algorithm, most commonly Proof of Work (PoW). PoW is used in blockchain-based systems to establish trust among peers; however, it does require the expenditure of an enormous amount of energy that affects the environmental sustainability of blockchain-based systems. Energy minimization, whilst ensuring trust within blockchain-based systems that use PoW, is a challenging problem. The solution has to consider how energy consumption can be minimized without compromising trust, whilst still ensuring, for instance, scalability, security, and decentralization. In this paper, we represent the problem as a subset selection problem of miners in a blockchain-based system. We formulate the problem of blockchain energy consumption as a Search-Based Software Engineering problem with four objectives: energy consumption, carbon emission, decentralization, and trust. We propose a model composed of multiple fitness functions. The model can be used to explore the complex search space by selecting a subset of miners that minimizes the energy consumption without drastically impacting the primary goals of the blockchain technology (i.e., security/trustworthiness and decentralization). We integrate our proposed fitness functions into five evolutionary algorithms to solve the problem of blockchain miners selection. Our results show that the environmental sustainability of blockchain-based systems (e.g. reduced energy use) can be enhanced with little degradation in other competing objectives. We also report on the performance of the algorithms used.
- Research Article
6
- 10.1007/s10664-022-10127-4
- Aug 6, 2022
- Empirical Software Engineering
- Jiahui Wu + 4 more
On the preferences of quality indicators for multi-objective search algorithms in search-based software engineering
- Research Article
- 10.14704/nq.2022.20.5.nq22163
- May 2, 2022
- NeuroQuantology
- Divya Sharma + 1 more
Software refactoring is a technique for reorganising and improving the efficiency of existing software code. By refining the non-functional aspects of software, numerous refactoring methods are currently being employed to build more intelligible and less composite codes. By applying multiple systems to the source code, refactoring can improve code maintainability even further, preserving the behaviour of the code. Refactoring allows for the eradication of bugs and the expansion of the program's capabilities. This paper provides a comprehensive assessment of source code with foul odours, the influence on software quality of using certain refactoring methodologies to eradicate the foul smell. Between 2008 and 2010, a total of 76 studies from 42 journals, 20 conferences, and 14 additional sources in the year 2008 and 2022 were available. This study was graded on the number of unpleasant odours identified, refactoring strategies applied, and their impact on software metrics. The foul smells of "method of long," "envious feature “and” class of data" were discovered or corrected in the majority of inquiries. The odour of "envious feature" was detected in 39.66 per cent of nominated investigations. The majority of studies looked at the effects of restructuring on software intricacy and coupling measures. Surprisingly, instead of patented software, the majority of the investigations employed massive opensource datasets released in Java. Finally, this research makes suggestions for further into the refactoring research code.
- Research Article
12
- 10.1016/j.jss.2022.111349
- Apr 27, 2022
- Journal of Systems and Software
- Javier Yuste + 2 more
An efficient heuristic algorithm for software module clustering optimization
- Research Article
3
- 10.1080/24751839.2022.2047470
- Mar 24, 2022
- Journal of Information and Telecommunication
- Tsutomu Kumazawa + 2 more
ABSTRACT Model checking is a formal and automated verification technique to show that a software system behaves in accordance with the given specification. Traditional model checking uses exhaustive search techniques for finding violative behaviours of the specification. The techniques, however, often do not work for huge systems because it demands a huge amount of computational resources. Search-Based Software Engineering is known to effectively solve many software engineering problems including model checking. It pursues the good balance between efficiency and qualities of solutions by using swarm intelligence and metaheuristic search methodologies. This article focuses on the state-of-the-art model checking with Ant Colony Optimization. Ant Colony Optimization is a metaheuristic, population-based and stochastic optimization algorithm. We propose two exploration strategies to further improve the balance in model checking based on Ant Colony Optimization. The proposed strategies introduce different kinds of randomized selection mechanisms to diversify solutions found by many agents. The strategies help the search algorithm extend the reachable regions effectively. Through numerical experiments, we confirmed that the proposed strategies require less computation time and memory as compared to the existing model checking with Ant Colony Optimization at the cost of finding slightly less qualified solutions.
- Research Article
- 10.1007/s11042-021-11882-0
- Feb 2, 2022
- Multimedia Tools and Applications
- Amarjeet Prajapati + 4 more
Multimedia in search-based software engineering: challenges and opportunities within a new research domain