A novel methodology to classify test cases using natural language processing and imbalanced learning

Sahar Tahvili,Leo Hatvani,Enislay Ramentol,Rita Pimentel,Wasif Afzal,Francisco Herrera

doi:10.1016/j.engappai.2020.103878

Abstract

Detecting the dependency between integration test cases plays a vital role in the area of software test optimization. Classifying test cases into two main classes – dependent and independent – can be employed for several test optimization purposes such as parallel test execution, test automation, test case selection and prioritization, and test suite reduction. This task can be seen as an imbalanced classification problem due to the test cases’ distribution. Often the number of dependent and independent test cases is uneven, which is related to the testing level, testing environment and complexity of the system under test. In this study, we propose a novel methodology that consists of two main steps. Firstly, by using natural language processing we analyze the test cases’ specifications and turn them into a numeric vector. Secondly, by using the obtained data vectors, we classify each test case into a dependent or an independent class. We carry out a supervised learning approach using different methods for handling imbalanced datasets. The feasibility and possible generalization of the proposed methodology is evaluated in two industrial projects at Bombardier Transportation, Sweden, which indicates promising results.

Highlights

Software testing is an important and effort-intensive activity in the software development life cycle (SDLC), and test optimization plays a vital role in the testing domain
We provide some background information about natural language processing (NLP) and the utilized method in this paper, Doc2Vec
Approach can be summarized as 1 - improving the obtained results from the clustering approach in Tahvili et al (2019), 2- employing the ground truth for labeling data, and thereby applying a supervised learning approach, and 3- applying a different methodology for classifying test cases into dependent and independent

Summary

Introduction

Software testing is an important and effort-intensive activity in the software development life cycle (SDLC), and test optimization plays a vital role in the testing domain. The manual testing process is still a popular approach especially in the safety critical systems, where assurance arguments depend on human judgment (Chechik et al, 2019). We aim to split manual integration test cases into two main classes: dependent and independent. (I) The first step is to obtain a numeric dataset format In this regard, we use neural networks model that turns the manual descriptions of the test cases into numeric vectors of descriptive features. In many cases, the number of dependent and independent test cases is unevenly distributed, which means that we are facing a class imbalance problem. A different variety of solutions for class imbalance problems have been proposed in recent years They can be divided into four main groups: data level, design of specific classification algorithms, cost sensible and ensembles.

Preliminaries

Dependency detection

On the use of natural language processing

On the use of imbalance learning

Related work

Dependency identification in software testing

Natural language processing in software testing

The methodology

Classification

Industrial Case studies

Case studies

The ground truth

Experimental evaluation

Parameters

Method

Model evaluation using unseen data

Threats to validity

Concluding remarks

Findings

From a software testing point of view:

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Engineering Applications of Artificial Intelligence	Publication Date: Aug 14, 2020
Citations: 25	License type: cc-by

R Discovery Prime

R Discovery Prime

A novel methodology to classify test cases using natural language processing and imbalanced learning

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Similar Papers

Reinforcement learning for automatic test case prioritization and selection in continuous integration
Helge Spieker ... Morten Mossige
-
Helge Spieker, et. al.Helge Spieker ... Morten Mossige
10 Jul 2017
10 Jul 2017

Using Global Constraints to Automate Regression Testing
Arnaud Gotlieb ... Dusica Marijan
AI Magazine | VOL. 38
Arnaud Gotlieb, et. al.Arnaud Gotlieb ... Dusica Marijan
01 Mar 2017
AI Magazine | VOL. 38

Test case selection using multi-criteria optimization for effective fault localization
Ke-Chao Wang ... Tian-Tian Wang
Computing | VOL. 100
Ke-Chao Wang, et. al.Ke-Chao Wang ... Tian-Tian Wang
22 Mar 2018
Computing | VOL. 100

Risk-driven security testing using risk analysis with threat modeling approach.
Maragathavalli Palanivel ... Kanmani Selvadurai
SpringerPlus | VOL. 3
Maragathavalli Palanivel, et. al.Maragathavalli Palanivel ... Kanmani Selvadurai
01 Dec 2014
SpringerPlus | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A novel methodology to classify test cases using natural language processing and imbalanced learning

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence