Symbolic Execution for Runtime Error Detection and Investigation of Refactoring Activities Based on a New Dataset

István Kádár

doi:10.14232/phd.4138

Abstract

It is a big challenge in software engineering to produce huge, reliable and robust software systems. In industry, developers typically have to focus on solving problems quickly. The importance of code quality in time pressure is frequently secondary. However, software code quality is very important, because a too complex, hard-to-maintain code results in more bugs, and makes further development more expensive. The research work behind this thesis is inspired by the wish to develop high quality software systems in industry in a more effective and easier way to make the lives of customers and eventually end-users more comfortable and more effective. The thesis consists of two main topics: the utilization of symbolic execution for runtime error detection and the investigation of practical refactoring activity. Both topics address the area of program source code quality. Symbolic execution is a program analysis technique which explores the possible execution paths of a program by handling the inputs as unknown variables (symbolic variables). The main usages of symbolic execution are generating inputs of program failure and high-coverage test cases. It is able to expose defects that would be very difficult and timeconsuming to find through manual testing, and would be exponentially more costly to fix if they were not detected until runtime. In this work, we focus on runtime error detection (such as null pointer dereference, bad array indexing, division by zero, etc.) by discovering critical execution paths in Java programs. One of the greater challenges in symbolic execution is the very large number of possible execution paths, which increas exponentially. Our research proposes approaches to handling the aforementioned problem of path explosion by applying symbolic execution at the level of methods. We also investigated the limitations of this state space together with the development of efficient search heuristics. To make the detection of runtime errors more accurate, we propose a novel algorithm that keeps track of the conditions above symbolic variables during the analysis. Source code refactoring is a popular and powerful technique for improving the internal structure of software systems. The concept of refactoring was introduced by Martin Fowler. He originally proposed that detecting code smells should be the primary technique for identifying refactoring opportunities in the code. However, we lack empirical research results on how, when and why refactoring is used in everyday software development, what are its effects on short- and long-term maintainability and costs. By getting answers to these questions, we could understand how developers refactor code in practice, which would help propose new methods and tools for them that are aligned with their current habits leading to more effective software engineering methodologies in the industry. To help further empirical investigations of code refactoring, we proposed a publicly available refactoring dataset. The dataset consists of refactorings and source code metrics of open-source Java systems. We subjected the dataset to an analysis of the effects of code refactoring on source code metrics and maintainability, which are primary quality attributes in software development.

Full Text