Public Software Development Activity During the Pandemic
Background The emergence of the COVID-19 pandemic has impacted all human activity, including software development. Early reports seem to indicate that the pandemic may have had a negative effect on software developers, socially and personally, but that their software development productivity may not have been negatively impacted. Aims: Early reports about the effects of the pandemic on software development focused on software developers' well-being and on their productivity as employees. We are interested in a different aspect of software development: the developers' public contributions, as seen in GitHub and Stack Overflow activities. Did the pandemic affect the developers' public contributions and, of so, in what way? Method: Considering the data from between 2017 and till 2020, we study the trends within GitHub's push, create, pull request, and release events, and within Stack Overflow's new users, posts, votes, and comments. We performed linear regressions, correlation analyses, outlier analyses, hypothesis testing, and we also contacted individual developers in order to gather qualitative insights about their unusual public contributions. Results: Our study shows that within GitHub and Stack Overflow, the onset of the pandemic (March/April 2020) is reflected in a set of outliers in developers' contributions that point to an increase in activity. The distributions of contributions during the entire year of 2020 were, in some aspects, different, but, in other aspects, similar from the recent past. Additionally, we found one noticeably disrupted pattern of contribution in Stack Overflow, namely the ratio Questions/Answers, which was much higher in 2020 than before. Testimonials from the developers we contacted were mixed: while some developers reported that their increase in activity was due to the pandemic, others reported that it was not. Conclusion: In Github, there was a noticeable increase in public software development activity in 2020, as well as more abrupt changes in daily activities; in Stack Overflow, there was a noticeable increase in new users and new questions at the onset of the pandemic, and in the ratio of Questions/Answers during 2020. The results may be attributed to the pandemic, but other factors could have come into play.
- Research Article
60
- 10.1108/dta-07-2017-0054
- Feb 9, 2018
- Data Technologies and Applications
PurposeSoftware developers extensively use stack overflow (SO) for knowledge sharing on software development. Thus, software engineering researchers have started mining the structured/unstructured data present in certain software repositories including the Q&A software developer community SO, with the aim to improve software development. The purpose of this paper is show that how academics/practitioners can get benefit from the valuable user-generated content shared on various online social networks, specifically from Q&A community SO for software development.Design/methodology/approachA comprehensive literature review was conducted and 166 research papers on SO were categorized about software development from the inception of SO till June 2016.FindingsMost of the studies revolve around a limited number of software development tasks; approximately 70 percent of the papers used millions of posts data, applied basic machine learning methods, and conducted investigations semi-automatically and quantitative studies. Thus, future research should focus on the overcoming existing identified challenges and gaps.Practical implicationsThe work on SO is classified into two main categories; “SO design and usage” and “SO content applications.” These categories not only give insights to Q&A forum providers about the shortcomings in design and usage of such forums but also provide ways to overcome them in future. It also enables software developers to exploit such forums for the identified under-utilized tasks of software development.Originality/valueThe study is the first of its kind to explore the work on SO about software development and makes an original contribution by presenting a comprehensive review, design/usage shortcomings of Q&A sites, and future research challenges.
- Research Article
1
- 10.1016/0066-4138(90)90011-f
- Jan 1, 1988
- Annual Review in Automatic Programming
Methods for monitoring productivity in applicative software development
- Research Article
- 10.1016/s1474-6670(17)53682-4
- Sep 1, 1988
- IFAC Proceedings Volumes
Methods for Monitoring Productivity in Applicative Software Development
- Research Article
104
- 10.1007/s10664-018-9650-5
- Oct 1, 2018
- Empirical Software Engineering
Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Using those snippets raises maintenance and legal issues. SO’s license (CC BY-SA 3.0) requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO’s license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. We present results of a large-scale empirical study analyzing the usage and attribution of non-trivial Java code snippets from SO answers in public GitHub (GH) projects. We followed three different approaches to triangulate an estimate for the ratio of unattributed usages and conducted two online surveys with software developers to complement our results. For the different sets of projects that we analyzed, the ratio of projects containing files with a reference to SO varied between 3.3% and 11.9%. We found that at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required. Of the surveyed developers, almost one half admitted copying code from SO without attribution and about two thirds were not aware of the license of SO code snippets and its implications.
- Conference Article
10
- 10.1109/msr52588.2021.00053
- May 1, 2021
Software developers are social creatures: they communicate, collaborate, and\npromote their work in a variety of channels. Twitter, GitHub, Stack Overflow,\nand other platforms offer developers opportunities to network and exchange\nideas. Researchers analyze content on these sites to learn about trends and\ntopics in software engineering. However, insight mined from the text of Stack\nOverflow questions or GitHub issues is highly focused on detailed and technical\naspects of software development. In this paper, we present a relatively new\nonline community for software developers called DEV. On DEV users write\nlong-form posts about their experiences, preferences, and working life in\nsoftware, zooming out from specific issues and files to reflect on broader\ntopics. About 50,000 users have posted over 140,000 articles related to\nsoftware development. In this work, we describe the content of posts on DEV\nusing a topic model, showing that developers discuss a rich variety and mixture\nof social and technical aspects of software development. We show that\ndevelopers use DEV to promote themselves and their work: 83% link their\nprofiles to their GitHub profiles and 56% to their Twitter profiles. 14% of\nusers pin specific GitHub repos in their profiles. We argue that DEV is\nemerging as an important hub for software developers, and a valuable source of\ninsight for researchers to complement data from platforms like GitHub and Stack\nOverflow.\n
- Research Article
10
- 10.1016/j.jss.2024.111964
- Jan 8, 2024
- Journal of Systems and Software
An empirical study of code reuse between GitHub and stack overflow during software development
- Research Article
6
- 10.1142/s0218194021500467
- Oct 1, 2021
- International Journal of Software Engineering and Knowledge Engineering
GitHub and Stack Overflow are often used together for software development. GH-SO users, who use both GitHub and Stack Overflow, contribute to the development of various software projects in GitHub and share their knowledge and experience on software development in Stack Overflow. To widely understand the interests and working habits of GH-SO users on software development, it is important to investigate how GH-SO users utilize GitHub and Stack Overflow. In this paper, we present an exploratory study on GitHub commit and Stack Overflow post activities of GH-SO users. Specifically, we investigate the working habits of GH-SO users on GitHub commit and Stack Overflow post activities. We randomly selected 19,756 of GH-SO users as our target sample and collected 2,819,483 and 2,147,317 of commit activity data and post activity data of the GH-SO users. We then categorized the collected commit and post activity datasets into specific categories on programming languages and statistically analyzed the categorized commit and post activity datasets. As the results of our analysis, we found the following: (1) The overall commit and post activities of the GH-SO users share some similarity. (2) The commit activities gradually change while the post activities drastically change over time. (3) The commit activities of the GH-SO users are broadly distributed while the post activities are narrowly distributed and the commit activity can be better predictor for post activity. (4) The commit activity of the GH-SO users tends to be performed prior post activity. We believe that our findings can contribute to finding the ways to better support commit and post activities of GitHub and Stack Overflow users.
- Conference Article
261
- 10.1145/2884781.2884800
- May 14, 2016
Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "insight sentences" from Stack Overflow- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.
- Research Article
7
- 10.4236/jsea.2011.411072
- Jan 1, 2011
- Journal of Software Engineering and Applications
In this paper, we identify a set of factors that may be used to forecast software productivity and software development time. Software productivity was measured in function points per person hours, and software development time was measured in number of elapsed days. Using field data on over 130 field software projects from various industries, we empirically test the impact of team size, integrated computer aided software engineering (ICASE) tools, software development type, software development platform, and programming language type on the software development productivity and development time. Our results indicate that team size, software development type, software development platform, and programming language type significantly impact software development productivity. However, only team size significantly impacts software development time. Our results indicate that effective management of software development teams, and using different management strategies for different software development type environments may improve software development productivity.
- Conference Article
7
- 10.1145/3196321.3196348
- May 28, 2018
Developers introduce bugs during software development which reduce software reliability. Many of these bugs are commonly occurring and have been experienced by many other developers. Informing developers, especially novice ones, about commonly occurring bugs in a domain of interest (e.g., Java), can help developers comprehend program and avoid similar bugs in the future. Unfortunately, information about commonly occurring bugs are not readily available. To address this need, we propose a novel approach named RFEB which recommends frequently encountered bugs (FEBugs) that may affect many other developers. RFEB analyzes Stack Overflow which is the largest software engineering-specific Q&A communities. Among the plenty of questions posted in Stack Overflow, many of them provide the descriptions and solutions of different kinds of bugs. Unfortunately, the search engine that comes with Stack Overflow is not able to identify FEBugs well. To address the limitation of the search engine of Stack Overflow, we propose RFEB which is an integrated and iterative approach that considers both relevance and popularity of Stack Overflow questions to identify FEBugs. To evaluate the performance of RFEB, we perform experiments on a dataset from Stack Overflow which contains more than ten million posts. We compared our model with Stack Overflow's search engine on 10 domains, and the experiment results show that RFEB achieves the average NDCG10 score of 0.96, which improves Stack Overflow's search engine by 20%.
- Conference Article
20
- 10.1145/3290607.3312801
- May 2, 2019
Software developers use Stack Overflow on a daily basis to search for solutions to problems they encounter during bug fixing and feature enhancement. In prior work, studies have been done on mining Stack Overflow data such as for predicting unanswered questions or how and why people post. However, no work exists on how developers actually use, or more importantly, read the information presented to them on Stack Overflow. To better understand this behavior, we conduct an eye tracking study on how developers seek for information on Stack Overflow while tasked with creating human-readable summaries of methods and classes in large Java projects. Eye gaze data is collected on both the source code elements and Stack Overflow document elements at a fine token-level granularity using iTrace, our eye tracking infrastructure. We found that developers look at the text more often than the title in posts. Code snippets were the second most looked at element. Tags and votes are rarely looked at. When switching between Stack Overflow and the Eclipse Integrated Development Environment (IDE), developers often looked at method signatures and then switched to code and text elements on Stack Overflow. Such heuristics provide insight to automated code summarization tools as they decide what to give more weight to while generating summaries.
- Dissertation
1
- 10.32657/10356/75873
- Jan 1, 2018
With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search.
- Conference Article
270
- 10.1109/sp.2017.31
- May 1, 2017
S.121-136
- Conference Article
18
- 10.1145/3468264.3473114
- Aug 18, 2021
Stack Overflow is one of the most popular technical Q&A sites used by software developers. Seeking help from Stack Overflow has become an essential part of software developers’ daily work for solving programming-related questions. Although the Stack Overflow community has provided quality assurance guidelines to help users write better questions, we observed that a significant number of questions submitted to Stack Overflow are of low quality. In this paper, we introduce a new web-based tool, Code2Que, which can help developers in writing higher quality questions for a given code snippet. Code2Que consists of two main stages: offline learning and online recommendation. In the offline learning phase, we first collect a set of good quality ⟨code snippet, question⟩ pairs as training samples. We then train our model on these training samples via a deep sequence-to-sequence approach, enhanced with an attention mechanism, a copy mechanism and a coverage mechanism. In the online recommendation phase, for a given code snippet, we use the offline trained model to generate question titles to assist less experienced developers in writing questions more effectively. To evaluate Code2Que, we first sampled 50 low quality ⟨code snippet, question⟩ pairs from the Python and Java datasets on Stack Overflow. Then we conducted a user study to evaluate the question titles generated by our approach as compared to human-written ones using three metrics: Clearness, Fitness and Willingness to Respond. Our experimental results show that for a large number of low-quality questions in Stack Overflow, Code2Que can improve the question titles in terms of Clearness, Fitness and Willingness measures.
- Research Article
24
- 10.1016/j.jss.2022.111427
- Jun 28, 2022
- Journal of Systems and Software
Impact of individualism and collectivism cultural profiles on the behaviour of software developers: A study of stack overflow