Measuring Test Data Uniformity in Acceptance Tests for the FitNesse and Gherkin Notations

Douglas Hiura Longo,Patr�cia Vilain,Lucas Pereira da Silva

doi:10.3844/jcssp.2021.135.155

Douglas Hiura Longo, Patr�cia Vilain + Show 1 more

Open Access

https://doi.org/10.3844/jcssp.2021.135.155

Copy DOI

Abstract

This paper presents two metrics designed to measure the data uniformity of acceptance tests in FitNesse and Gherkin notations. The objective is to measure the data uniformity of acceptance tests in order to identify projects with lots of random and meaningless data. Random data in acceptance tests hinder communication between stakeholders and increase the volume of glue code. The main contribution of this paper is the implementation of the proposed metrics. This paper also evaluates the uniformity of test data from several FitNesse and Gherkin projects found on GitHub, as a means to verify if the metrics are applicable. First, the metrics were applied to 18 FitNesse project repositories and 18 Gherkin project repositories. The measurements taken from these repositories were used to present cases of irregular and uniform test data. Then, we have compared the notations from FitNesse and Gherkin in terms of projects and features. In terms of projects, no significant difference was observed, that is, FitNesse projects have a level of uniformity similar to Gherkin projects. However, in terms of features and test documents, there was a significant difference. The uniformity scores of FitNesse and Gherkin features are 0.16 and 0.26, respectively. These uniformity scores are very low, which means that test data for both notations are very irregular. Thus, we can infer that test data are more irregular in FitNesse features than in Gherkin features. The evaluation also shows that 28 of 36 projects (78%) did not reach the minimum recommended measure, i.e., 0.45 of test data uniformity. In general, we can observe that there are still many challenges in improving the quality of acceptance tests, especially in relation to the uniformity of test data.

Highlights

Analogous to Test-Driven Development (TDD) (Beck, 2003), Acceptance Test-Driven Development (ATDD) includes different stakeholders who collaborate to write acceptance tests before implementing system functionality (Gärtner, 2012)
We evaluate the uniformity of the acceptance test data of several projects that use these notations and present a comparison of the uniformity between FitNesse and Gherkin
The comparison between FitNesse and Gherkin suggests that there are no differences in project sizes

Summary

Introduction

Analogous to Test-Driven Development (TDD) (Beck, 2003), Acceptance Test-Driven Development (ATDD) includes different stakeholders (client, developer, tester) who collaborate to write acceptance tests before implementing system functionality (Gärtner, 2012). Teams involved with ATDD generally find that, only by defining acceptance tests and discussing test specifications, there will be a better understanding of the requirements. This happens because acceptance tests tend to force the need for a solid agreement on the exact behavior that is expected from a software (Hendrickson, 2008). According to Santos, (Longo and Vilain, 2018), there are 21 techniques used to specify acceptance tests. Specifying software requirements using acceptance tests is an attempt to improve the quality of requirements. Several problems can arise from the specification of requirements using acceptance tests, as it happens with the specification of requirements using natural language.

Objectives

Results

Discussion

Conclusion