GitHub Projects Research Articles

Continuous Integration (CI) is a set of software development practices that allow software development teams to generate software builds more quickly and periodically (e.g., daily or even hourly). CI brings many advantages, such as the early identification of errors when integrating code. When builds are generated frequently, a long build duration may hold developers from performing other important tasks. Recent research has shown that a considerable amount of development time is invested on optimizing the generation of builds. However, the reasons behind long build durations are still vague and need an in-depth study. Our initial investigation shows that many projects have build durations that far exceed the acceptable build duration (i.e., 10 minutes) as reported by recent studies. In this paper, we study several characteristics of CI builds that may be associated with the long duration of CI builds. We perform an empirical study on 104,442 CI builds from 67 GitHub projects. We use mixed-effects logistic models to model long build durations across projects. Our results reveal that, in addition to common wisdom factors (e.g., project size, team size, build configuration size, and test density), there are other highly important factors to explain long build durations. We observe that rerunning failed commands multiple times is most likely to be associated with long build durations. We also find that builds may run faster if they are configured (a) to cache content that does not change often or (b) to finish as soon as all the required jobs finish. However, we observe that about 40% of the studied projects do not use or misuse such configurations in their builds. In addition, we observe that triggering builds on weekdays or at daytime is most likely to have a direct relationship with long build durations. Our results suggest that developers should use proper CI build configurations to maintain successful builds and to avoid long build durations. Tool builders should supply development teams with tools to identify cacheable spots of the project in order to accelerate the generation of CI builds.

Context: Better methods of evaluating process performance of OSS projects can benefit decision makers who consider adoption of OSS software in a company. This article studies the closure of issues (bugs and features) in GitHub projects, which is an important measure of OSS development process performance and quality of support that project users receive from the developer team.Objective: The goal of this article is a better understanding of the factors that affect issue closure rates in OSS projects.Methodology: The GHTorrent repository is used to select a large sample of mature, active OSS projects. Using survival analysis, we calculate short-term, and long-term issue closure rates. We formulate several hypotheses regarding the impact of OSS project and team characteristics, such as measures of work centralization, measures that reflect internal project workflows, and developer social networks measures on issue closure rates. Based on the proposed features and several control features, a model is built that can predict issue closure rate. The model allows to test our hypotheses.Results: We find that large teams that have many project members have lower issue closure rates than smaller teams. Similarly, increased work centralization increases issue closure rates. While desirable social network characteristics have a positive impact on the amount of commits in a project, they do not have significant influence on issue closure.Conclusion: Overall, findings from empirical analysis support the classic notion of Brook’s – the “surgical team” – in the context of OSS project development process performance on GitHub. The model of issue closure rates proposed in this article is a first step towards an improved understanding and prediction of this important measure of OSS development process performance.

GitHub Projects Research Articles

Related Topics

Articles published on GitHub Projects

Siamese: scalable and incremental code clone search via multiple code representations

An empirical study of the long duration of continuous integration builds

What are the Characteristics of Reopened Pull Requests? A Case Study on Open Source Projects in GitHub

What Characterizes an Influencer in Software Ecosystems?

Scenario Oriented Program Slicing for Large-Scale Software Through Constraint Logic Programming and Program Transformation

Studying the Impact of Noises in Build Breakage Data

History-Driven Fix for Code Quality Issues

CodeAttention: translating source code to comments by exploiting the code constructs

Usage and attribution of Stack Overflow code snippets in GitHub projects

What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform

How Do Static and Dynamic Test Case Prioritization Techniques Perform on Modern Software Systems? An Extensive Study on GitHub Projects

Toxic Code Snippets on Stack Overflow

Unusual events in GitHub repositories

RevRec: A two-layer reviewer recommendation algorithm in pull-based development model

Surgical teams on GitHub: Modeling performance of GitHub project development processes

An Initial Step Towards Organ Transplantation Based on GitHub Repository

Internal quality assurance for external contributions in GitHub: An empirical investigation

Developer Identity Linkage and Behavior Mining Across GitHub and StackOverflow

The appropriation of GitHub for curation

Model Description Language (MDL): A Standard for Modeling and Simulation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

GitHub Projects Research Articles

Related Topics

Articles published on GitHub Projects

Siamese: scalable and incremental code clone search via multiple code representations

An empirical study of the long duration of continuous integration builds

What are the Characteristics of Reopened Pull Requests? A Case Study on Open Source Projects in GitHub

What Characterizes an Influencer in Software Ecosystems?

Scenario Oriented Program Slicing for Large-Scale Software Through Constraint Logic Programming and Program Transformation

Studying the Impact of Noises in Build Breakage Data

History-Driven Fix for Code Quality Issues

CodeAttention: translating source code to comments by exploiting the code constructs

Usage and attribution of Stack Overflow code snippets in GitHub projects

What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform

How Do Static and Dynamic Test Case Prioritization Techniques Perform on Modern Software Systems? An Extensive Study on GitHub Projects

Toxic Code Snippets on Stack Overflow

Unusual events in GitHub repositories

RevRec: A two-layer reviewer recommendation algorithm in pull-based development model

Surgical teams on GitHub: Modeling performance of GitHub project development processes

An Initial Step Towards Organ Transplantation Based on GitHub Repository

Internal quality assurance for external contributions in GitHub: An empirical investigation

Developer Identity Linkage and Behavior Mining Across GitHub and StackOverflow

The appropriation of GitHub for curation

Model Description Language (MDL): A Standard for Modeling and Simulation