Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

NxtUnit: Automated Unit Test Generation for Go

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

NxtUnit is an automated unit test generator for Go that employs random testing, achieving a 20.74% code coverage on in-house repositories and saving engineers 48% of testing time. It offers IDE, CLI, and browser interfaces, facilitating quick smoke testing and quality feedback.

Abstract
Translate article icon Translate Article Star icon

Automated test generation has been extensively studied for dynamically compiled or typed programming languages like Java and Python. However, Go, a popular statically compiled and typed programming language for server application development, has received limited support from existing tools. To address this gap, we present NxtUnit, an automatic unit test generation tool for Go that uses random testing and is well-suited for microservice architecture. NxtUnit employs a random approach to generate unit tests quickly, making it ideal for smoke testing and providing quick quality feedback. It comes with three types of interfaces: an integrated development environment (IDE) plugin, a command-line interface (CLI), and a browser-based platform. The plugin and CLI tool allow engineers to write unit tests more efficiently, while the platform provides unit test visualization and asynchronous unit test generation. We evaluated NxtUnit by generating unit tests for 13 open-source repositories and 500 ByteDance in-house repositories, resulting in a code coverage of 20.74% for in-house repositories. We conducted a survey among Bytedance engineers and found that NxtUnit can save them 48% of the time on writing unit tests. We have made the CLI tool available at https://github.com/bytedance/nxt_unit.

Similar Papers
  • Research Article
  • Cite Count Icon 3
  • 10.1145/3715778
Less Is More: On the Importance of Data Quality for Unit Test Generation
  • Jun 19, 2025
  • Proceedings of the ACM on Software Engineering
  • Junwei Zhang + 5 more

Unit testing is crucial for software development and maintenance. Effective unit testing ensures and improves software quality, but writing unit tests is time-consuming and labor-intensive. Recent studies have proposed deep learning (DL) techniques or large language models (LLMs) to automate unit test generation. These models are usually trained or fine-tuned on large-scale datasets. Despite growing awareness of the importance of data quality, there has been limited research on the quality of datasets used for test generation. To bridge this gap, we systematically examine the impact of noise on the performance of learning-based test generation models. We first apply the open card sorting method to analyze the most popular and largest test generation dataset, Methods2Test, to categorize eight distinct types of noise. Further, we conduct detailed interviews with 17 domain experts to validate and assess the importance, reasonableness, and correctness of the noise taxonomy. Then, we propose CleanTest, an automated noise-cleaning framework designed to improve the quality of test generation datasets. CleanTest comprises three filters: a rule-based syntax filter, a rule-based relevance filter, and a model-based coverage filter. To evaluate its effectiveness, we apply CleanTest on two widely-used test generation datasets, i.e., Methods2Test and Atlas. Our findings indicate that 43.52% and 29.65% of datasets contain noise, highlighting its prevalence. Finally, we conduct comparative experiments using four LLMs (i.e., CodeBERT, AthenaTest, StarCoder, and CodeLlama7B) to assess the impact of noise on test generation performance. The results show that filtering noise positively influences the test generation ability of the models. Fine-tuning the four LLMs with the filtered Methods2Test dataset, on average, improves its performance by 67% in branch coverage, using the Defects4J benchmark. For the Atlas dataset, the four LLMs improve branch coverage by 39%. Additionally, filtering noise improves bug detection performance, resulting in a 21.42% increase in bugs detected by the generated tests.

  • Research Article
  • Cite Count Icon 1
  • 10.1145/3765758
Reference-Based Retrieval-Augmented Unit Test Generation
  • Dec 3, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Zhe Zhang + 5 more

Automated unit test generation has been widely studied, with Large Language Models (LLMs) recently showing significant potential. LLMs like GPT-4, trained in vast text and code data, excel in various code-related tasks, including unit test generation. However, existing LLM-based approaches often focus solely on the context within the code itself, such as referenced variables, while neglecting broader task-specific contexts, such as the utility of referring to existing tests of relevant methods in unit test generation. Moreover, in the context of unit test generation, these tools prioritize high code coverage, often at the expense of practical usability, correctness, and maintainability. In response, we propose Reference-Based Retrieval Augmentation , a novel mechanism that extends LLM-based Retrieval-Augmented Generation (RAG) to retrieve relevant information by considering task-specific context. In the unit test generation task, for a given focal method, the reference relationships is defined as the reusability or referentiality of tests between the focal method and other methods. To generate high-quality unit tests for the focal method, the test reference relationships are then used to retrieve relevant methods and their existing unit tests. Specifically, we account for the unique structure of unit tests by dividing the test generation process into Given , When , and Then phases. When generating unit tests for a focal method, we retrieve pre-existing tests of other relevant methods, which can provide valuable insights for any of the Given , When , and Then phases. We implement this approach in a tool called RefTest , which sequentially performs preprocessing, test reference retrieval, and unit test generation, using an incremental strategy in which newly generated tests guide the creation of subsequent ones. We evaluated RefTest on 12 open-source projects with 1515 methods, and the results demonstrate that RefTest consistently outperforms existing tools in terms of correctness, completeness, and maintainability of the generated tests.

  • Research Article
  • Cite Count Icon 24
  • 10.1007/s10009-009-0115-4
GenUTest: a unit test and mock aspect generation tool
  • Sep 3, 2009
  • International Journal on Software Tools for Technology Transfer
  • Benny Pasternak + 2 more

Unit testing plays a major role in the software development process. What started as an ad hoc approach is becoming a common practice among developers. It enables the immediate detection of bugs introduced into a unit whenever code changes occur. Hence, unit tests provide a safety net of regression tests and validation tests which encourage developers to refactor existing code with greater confidence. One of the major corner stones of the agile development approach is unit testing. Agile methods require all software classes to have unit tests that can be executed by an automated unit-testing framework. However, not all software systems have unit tests. When changes to such software are needed, writing unit tests from scratch, which is hard and tedious, might not be cost effective. In this paper we propose a technique which automatically generates unit tests for software that does not have such tests. We have implemented GenUTest, a prototype tool which captures and logs interobject interactions occurring during the execution of Java programs, using the aspect-oriented language AspectJ. These interactions are used to generate JUnit tests. They also serve in generating mock aspects—mock object-like entities, which enable testing units in isolation. The generated JUnit tests and mock aspects are independent of the tool, and can be used by developers to perform unit tests on the software. Comprehensiveness of the unit tests depends on the software execution. We applied GenUTest to several open source projects such as NanoXML and JODE. We present the results, explain the limitations of the tool, and point out direction to future work to improve the code coverage provided by GenUTest and its scalability.

  • Conference Article
  • Cite Count Icon 36
  • 10.1109/icse43902.2021.00138
Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?
  • May 1, 2021
  • Song Wang + 5 more

Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.

  • Conference Article
  • Cite Count Icon 61
  • 10.5555/2337223.2337309
BALLERINA: automatic generation and clustering of efficient random unit tests for multithreaded code
  • Jun 2, 2012
  • Adrian Nistor + 4 more

Testing multithreaded code is hard and expensive. Each multithreaded unit test creates two or more threads, each executing one or more methods on shared objects of the class under test. Such unit tests can be generated at random, but basic generation produces tests that are either slow or do not trigger concurrency bugs. Worse, such tests have many false alarms, which require human effort to filter out. We present BALLERINA, a novel technique for automatic generation of efficient multithreaded random tests that effectively trigger concurrency bugs. BALLERINA makes tests efficient by having only two threads, each executing a single, randomly selected method. BALLERINA increases chances that such a simple parallel code finds bugs by appending it to more complex, randomly generated sequential code. We also propose a clustering technique to reduce the manual effort in inspecting failures of automatically generated multithreaded tests. We evaluate BALLERINA on 14 real-world bugs from 6 popular codebases: Groovy, Java JDK, jFreeChart, Log4j, Lucene, and Pool. The experiments show that tests generated by BALLERINA can find bugs on average 2X-10X faster than various configurations of basic random generation, and our clustering technique reduces the number of inspected failures on average 4X-8X. Using BALLERINA, we found three previously unknown bugs in Apache Pool and Log4j, one of which was already confirmed and fixed.

  • Research Article
  • Cite Count Icon 80
  • 10.1145/3660783
Evaluating and Improving ChatGPT for Unit Test Generation
  • Jul 12, 2024
  • Proceedings of the ACM on Software Engineering
  • Zhiqiang Yuan + 6 more

Unit testing plays an essential role in detecting bugs in functionally-discrete program units ( e.g. , methods). Manually writing high-quality unit tests is time-consuming and laborious. Although the traditional techniques are able to generate tests with reasonable coverage, they are shown to exhibit low readability and still cannot be directly adopted by developers in practice. Recent work has shown the large potential of large language models (LLMs) in unit test generation. By being pre-trained on a massive developer-written code corpus, the models are capable of generating more human-like and meaningful test code. In this work, we perform the first empirical study to evaluate the capability of ChatGPT ( i.e ., one of the most representative LLMs with outstanding performance in code generation and comprehension) in unit test generation. In particular, we conduct both a quantitative analysis and a user study to systematically investigate the quality of its generated tests in terms of correctness, sufficiency, readability, and usability. We find that the tests generated by ChatGPT still suffer from correctness issues, including diverse compilation errors and execution failures (mostly caused by incorrect assertions); but the passing tests generated by ChatGPT almost resemble manually-written tests by achieving comparable coverage, readability, and even sometimes developers’ preference. Our findings indicate that generating unit tests with ChatGPT could be very promising if the correctness of its generated tests could be further improved. Inspired by our findings above, we further propose ChatTester , a novel ChatGPT-based unit test generation approach, which leverages ChatGPT itself to improve the quality of its generated tests. Chat Tester incorporates an initial test generator and an iterative test refiner. Our evaluation demonstrates the effectiveness of ChatTester by generating 34.3 % more compilable tests and 18.7 % more tests with correct assertions than the default ChatGPT. In addition to ChatGPT, we further investigate the generalization capabilities of ChatTester by applying it to two recent open-source LLMs ( i.e. , CodeLlama-Instruct and CodeFuse) and our results show that ChatTester can also improve the quality of tests generated by these LLMs.

  • Research Article
  • Cite Count Icon 9
  • 10.1007/s10766-015-0363-8
Automatic Generation of Unit Tests for Correlated Variables in Parallel Programs
  • Mar 17, 2015
  • International Journal of Parallel Programming
  • Ali Jannesari + 1 more

A notorious class of concurrency bugs are race condition related to correlated variables, which make up about 30 % of all non-deadlock concurrency bugs. A solution to prevent this problem is the automatic generation of parallel unit tests. This paper presents an approach to generate parallel unit tests for variable correlations in multithreaded code. We introduce a hybrid approach for identifying correlated variables. Furthermore, we estimate the number of potentially violated correlations for methods executed in parallel. In this way, we are capable of creating unit tests that are suited for race detectors considering correlated variables. We were able to identify more than 85 % of all race conditions on correlated variables in eight applications after applying our parallel unit tests. At the same time, we reduced the number of unnecessary generated unit tests. In comparison to a test generator unaware of variable correlations, redundant unit tests are reduced by up to 50 %, while maintaining the same precision and accuracy in terms of the number of detected races.

  • Book Chapter
  • Cite Count Icon 7
  • 10.1007/978-3-540-77966-7_20
GenUTest: A Unit Test and Mock Aspect Generation Tool
  • Oct 23, 2007
  • Benny Pasternak + 2 more

Unit testing plays a major role in the software development process. It enables the immediate detection of bugs introduced into a unit whenever code changes occur. Hence, unit tests provide a safety net of regression tests and validation tests which encourage developers to refactor existing code. Nevertheless, not all software systems contain unit tests. When changes to such software are needed, writing unit tests from scratch might not be cost effective. In this paper we propose a technique which automatically generates unit tests for software that does not have such tests.We have implemented GenUTest, a tool which captures and logs inter-object interactions occurring during the execution of Java programs. These interactions are used to generate JUnit tests. They also serve in generating mock aspects - mock object like entities, which assist the testing process. The interactions are captured using the aspect oriented language AspectJ.

  • Conference Article
  • Cite Count Icon 62
  • 10.1109/icse.2012.6227145
Ballerina: Automatic generation and clustering of efficient random unit tests for multithreaded code
  • Jun 1, 2012
  • Adrian Nistor + 4 more

Testing multithreaded code is hard and expensive. A multithreaded unit test creates two or more threads, each executing one or more methods on shared objects of the class under test. Such unit tests can be generated at random, but basic random generation produces tests that are either slow or do not trigger concurrency bugs. Worse, such tests have many false alarms, which require human effort to filter out. We present Ballerina, a novel technique for automated random generation of efficient multithreaded tests that effectively trigger concurrency bugs. Ballerina makes tests efficient by having only two threads, each executing a single, randomly selected method. Ballerina increases chances that such simple parallel code finds bugs by appending it to more complex, randomly generated sequential code. We also propose a clustering technique to reduce the manual effort in inspecting failures of automatically generated multithreaded tests. We evaluate Ballerina on 14 real-world bugs from six popular codebases: Groovy, JDK, JFreeChart, Apache Log4j, Apache Lucene, and Apache Pool. The experiments show that tests generated by Ballerina find bugs on average 2×-10× faster than basic random generation, and our clustering technique reduces the number of inspected failures on average 4×-8×. Using Ballerina, we found three previously unknown bugs, two of which were already confirmed and fixed.

  • Dissertation
  • 10.31979/etd.kddt-d7ms
Multi-Model Unit Test Generation Framework With Reinforcement Learning
  • Jan 1, 2025
  • Tasman Kuang

Unit test generation is a critical step in the software development lifecycle to ensure code quality and reduce the likelihood of bugs. Manually writing unit tests can be time-consuming and require an experienced developer. However with the emergence of generative AI, large language models (LLMs) in particular have demonstrated their effectiveness in generating code, which naturally brings up the question of the possibility of applying this capability to automate unit test generation. One of the newer techniques in this field is using Reinforcement Learning (RL) to train a model to generate quality unit tests. RL is the practice of training an agent to take optimal actions to maximize a reward signal. By treating the LLM as an agent and fine-tuning its parameters through feedback from the reward signal, it offers an adaptive and flexible method for improving LLM performance instead of relying on pre-trained models. This project explores different methodologies to augment a multi-model unit test generation framework including the use of RL to train its test generation capabilities. Using datasets derived from LeetCode and PyMethods2Test, our tool is evaluated against strong baseline LLMs like Gemini and Claude. The results show that the PPO-trained DeepSeek model consistently outperforms baseline generation, achieving higher test pass rates, fewer syntax errors, and improved coverage and mutation scores across both datasets, demonstrating that our framework presents an effective unit test generation method.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1007/s10664-024-10451-x
Toward granular search-based automatic unit test case generation
  • May 17, 2024
  • Empirical Software Engineering
  • Fabiano Pecorelli + 4 more

Unit testing verifies the presence of faults in individual software components. Previous research has been targeting the automatic generation of unit tests through the adoption of random or search-based algorithms. Despite their effectiveness, these approaches aim at creating tests by solely optimizing metrics like code coverage, without ensuring that the resulting tests have granularities that would allow them to verify both the behavior of individual production methods and the interaction between methods of the class under test. To address this limitation, we propose a two-step systematic approach to the generation of unit tests: we first force search-based algorithms to create tests that cover individual methods of the production code, hence implementing the so-called intra-method tests; then, we relax the constraints to enable the creation of intra-class tests that target the interactions among production code methods. The assessment of our approach is conducted through a mixed-method research design that combines statistical analyses with a user study. The key results report that our approach is able to keep the same level of code and mutation coverage while providing test suites that are more structured, more understandable and aligned to the design principles of unit testing.

  • Conference Article
  • Cite Count Icon 3
  • 10.5753/sbes.2024.3561
Detecting Test Smells in Python Test Code Generated by LLM: An Empirical Study with GitHub Copilot
  • Sep 30, 2024
  • Victor Anthony Alves + 3 more

Writing unit tests is a time-consuming and labor-intensive development practice. Consequently, various techniques for automatically generating unit tests have been studied. Among them, the use of Large Language Models (LLMs) has recently emerged as a popular approach for automatically generating tests from natural language descriptions. Although many recent studies are dedicated to measuring the ability of LLMs to write valid unit tests, few evaluate the quality of these generated tests. In this context, this study aims to measure the quality of the test codes generated by GitHub Copilot in Python by detecting test smells in the test cases generated. To do this, we used approaches to generating unit tests by LLMs that have already been applied in the literature and collected a sample of 194 unit test cases in 30 Python test files. We then measured them using tools specialized in detecting test smells in Python. Finally, we conducted an evaluation of these test cases with software developers and software quality assurance professionals. Our results indicated that 47.4% of the tests generated by Copilot had at least one test smell detected, with a lack of documentation in the assertions being the most common quality problem. These findings indicate that although GitHub Copilot can generate valid unit tests, quality violations are still frequently found in these codes.

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/icodse.2015.7437005
Unit test code generator for lua programming language
  • Nov 1, 2015
  • Junno Tantra Pratama Wibowo + 2 more

Software testing is an important step in the software development lifecycle. One of the main process that take lots of time is developing the test code. We propose an automatic unit test code generation to speed up the process and helps avoiding repetition. We develop the unit test code generator using Lua programming language. Lua is a fast, lightweight, embeddable scripting language. It has been used in many industrial applications with focuses on embedded systems and games. Unlike other popular scripting language like JavaScript, Python, and Ruby, Lua does not have any unit test generator developed to help its software testing process. The final product, Lua unit test generator (LUTG), integrated to one of the most popular Lua IDE, ZeroBrane Studio, as a plugin to seamlessly connect the coding and testing process. The code generator can generate unit test code, save test cases data on Lua and XML file format, and generate the test data automatically using search-based technique, genetic algorithm, to achieve full branch coverage test criteria. Using this generator to test several Lua source code files shows that the developed unit test generator can help the unit testing process. It was expected that the unit test generator can improve productivity, quality, consistency, and abstraction of unit testing process.

  • Research Article
  • Cite Count Icon 1
  • 10.21609/jiki.v17i1.1198
Implementation Genetic Algorithm for Optimization of Kotlin Software Unit Test Case Generator
  • Feb 25, 2024
  • Jurnal Ilmu Komputer dan Informasi
  • Mohammad Andiez Satria Permana + 2 more

Unit testing has a significant role in software development and its impacts depend on the quality of test cases and test data used. To reduce time and effort, unit test generator systems can help automatically generate test cases and test data. However, there is currently no unit test generator for Kotlin programming language even though this language is popularly used for android application developments. In this study, we propose and develop a test generator system that utilizes genetic algorithm (GA) and ANTLR4 parser. GA is used to obtain the most optimal test cases and data for a given Kotlin code. ANTLR4 parser is used to optimize the mutation process in GA so that the mutation process is not totally random. Our model results showed that the average value of code coverage in generated unit tests against instruction coverage is 95.64%, with branch coverage of 76.19% and line coverage of 96.87%. In addition, only two out of eight generated classes produced duplicate test cases with a maximum of one duplication in each class. Therefore, it can be concluded that our optimization with GA on the unit test generator is able to produce unit tests with high code coverage and low duplication.

  • Book Chapter
  • Cite Count Icon 46
  • 10.1007/978-3-030-59762-7_2
Automated Unit Test Generation for Python
  • Jan 1, 2020
  • Stephan Lukasczyk + 2 more

Automated unit test generation is an established research field, and mature test generation tools exist for statically typed programming languages such as Java. It is, however, substantially more difficult to automatically generate supportive tests for dynamically typed programming languages such as Python, due to the lack of type information and the dynamic nature of the language. In this paper, we describe a foray into the problem of unit test generation for dynamically typed languages. We introduce Pynguin, an automated unit test generation framework for Python. Using Pynguin, we aim to empirically shed light on two central questions: (1) Do well-established search-based test generation methods, previously evaluated only on statically typed languages, generalise to dynamically typed languages? (2) What is the influence of incomplete type information and dynamic typing on the problem of automated test generation? Our experiments confirm that evolutionary algorithms can outperform random test generation also in the context of Python, and can even alleviate the problem of absent type information to some degree. However, our results demonstrate that dynamic typing nevertheless poses a fundamental issue for test generation, suggesting future work on integrating type inference.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant