An empirical study of code reuse between GitHub and stack overflow during software development

Yuan Huang,Xiangping Chen,Xiaocong Zhou,Zibin Zheng,Furen Xu

doi:10.1016/j.jss.2024.111964

Abstract

With the rise of programming Q&A websites (e.g., Stack Overflow) and the open-source movement, code reuse has become a common phenomenon. Our study aims to provide a comprehensive study of the code reuse behavior of programmers during software development, i.e., we mainly focus on the code reuse between the code snippets in the commits of open-source projects and the code snippets on Stack Overflow (SO). The open-source java project code dataset we construct contains 793 projects which include 342,148 modified code snippets, and the SO code dataset includes 1,355,617 posts. Then, we employ a code clone detection tool to identify the instances of code reuse between the modified code snippets of commits and the code snippets of the SO posts. We find that the average code reuse ratio of the projects is 6.32%, which will have a significant upward trend in the future. Additionally, we find that experienced developers seem to be more likely to reuse the code on SO, and prefer to reuse posts with more favorites and higher scores. We combine deep learning and topic analysis algorithms to fully exploit the semantic information of SO posts. The result shows a certain difference in the distribution of post types reused by bug-related commits and non-bug-related commits. We also discover that the code reuse ratio (14.44%) in java class files that have undergone multiple modifications is more than double the overall code reuse ratio (6.32%). Finally, we discuss the reuse habits of programmers and find that they may refer to multiple posts in one reuse, and these posts are related to a certain extent. From these results, our study provides multiple practical insights for different stakeholders: researchers, developers, and the SO platform.

Full Text