Abstract

When a person decides to inspect or modify a third-party software project, the first necessary step is its successful compilation from source code using a build system. However, such attempts often end in failure. In this data descriptor paper, we provide a dataset of build results of open source Java software systems. We tried to automatically build a large number of Java projects from GitHub using their Maven, Gradle, and Ant build scripts in a Docker container simulating a standard programmer’s environment. The dataset consists of the output of two executions: 7264 build logs from a study executed in 2016 and 7233 logs from the 2020 execution. In addition to the logs, we collected exit codes, file counts, and various project metadata. The proportion of failed builds in our dataset is 38% in the 2016 execution and 59% in the 2020 execution. The published data can be helpful for multiple purposes, such as correlation analysis of factors affecting build success, build failure prediction, and research in the area of build breakage repair.

Highlights

  • There are many possible situations when a person would like to build a third-party software system from source code

  • A student may want to contribute to a popular open source application

  • We aimed to focus on pure Java and excluded projects utilizing separate ecosystems

Read more

Summary

Introduction

There are many possible situations when a person would like to build a third-party software system from source code. A researcher could be searching for applications suitable for experiments In all of these situations, the person downloads the source code from a software forge such as GitHub and tries to build it using the supplied build script. This process often ends with a failure. None of the mentioned studies and datasets includes local (non-CI) build results and logs of a large number of open source projects. In this paper, we describe a dataset resulting from a study trying to build thousands of Java projects from GitHub. For each project, our script (i) downloaded the source code, (ii) determined, which build tool (Gradle, Maven, or Ant) does the project use, and (iii) tried to execute the corresponding command. We provide a brief comparison of the results from the 2016 and 2020 execution

Method
Inclusion Criteria
Project Downloading
Build Tool Selection
Environment
Build Tool Execution
Error Analysis
Directory Structure
Results File
Build Tools
Build Status
Error Categories
Potential Applications
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.