Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

An Empirical Analysis on Reducing Open Source Software Development Tasks using Stack Overflow

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Objectives: The cross repository analysis between Open Source Software (OSS) and Community Question Answering (CQA) site is presented in order to speed the development process of OSS. Methods/Analysis: The OSS development is becoming popular nowadays due to fact that the source codes, the developer specifications and bug lists are made available online to the public. Anyone can contribute to the development of software by referring these files. Similarly, Stack Overflow is an interactive CQA site that caters programming related questions with their answers online and turned into repositories of software engineering knowledge. In order to track the correlation of such sites with software development tasks, we employ the two repositories to find the semantic similarity between bugs and Question and Answer (Q&A) posts posted on OSS projects and Stack Overflow respectively. The semantic similarity is analyzed by integrating the contents of the repositories based on text mining approach. The relationship between a bug and Q&A post is established through the semantic similarity and metadata features. Findings: The statistics of our analysis is presented for five OSS projects in terms of number of bugs and average bug fix time. The statistical result shows that the bug fix time can be reduced by posting the bugs into Stack Overflow. Application/Improvement: The presented approach can be utilized to find the similar Q&A posts for reported OSS bug and helps developers of OSS projects to resolve the bugs quickly by leveraging programming skills of users' in the form of Q&A posts. Keywords: Open Source Software, Community Question Answering, Stack Overflow, Cross Repository Analysis, Bug Tracking System, Bug Fixing

Similar Papers
  • Conference Article
  • Cite Count Icon 8
  • 10.1109/empire.2015.7431307
How do open source software (OSS) developers practice and perceive requirements engineering? An empirical study
  • Aug 24, 2015
  • Jaison Kuriakose + 1 more

In open source software (OSS) development domain (a largely volunteer driven, geographically distributed, web based form of software development), it is mainly the OSS developers who are responsible for overseeing and managing the develop-mental activities. Existing OSS literature, based on qualitative analysis of web-based artifacts (e.g. data on discussion forums, issue databases) of a few OSS projects, report that requirements generation in OSS development is largely informal and ad hoc. But there is lack of an empirical study involving the practitioners themselves i.e. the OSS developers. We conducted a web-based survey among OSS developers in order to gain insights in to how they actually practice requirements engineering activities and what are their perceptions about it. For 57 requirements engineering practices obtained from closed source software development (CSSD) literature, the respondents indicated whether they currently used those practices in their OSS projects and whether those practices were useful for OSS development. The analysis of survey responses revealed that OSS developers used requirements engineering practices (from CSSD) significantly less in their developmental activities than what they believed they should have, indicated through usefulness ratings. We also asked participating OSS developers to indicate their perceptions about the usage of five informal requirements generation activities re-ported in OSS literature (e.g. developers simply asserting the requirements instead of eliciting). Subsequent analysis revealed that OSS developers used informal requirements generation activities significantly more than requirements elicitation practices (from CSSD) in their developmental activities. We use the survey findings to discuss implications for practice and research.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/tse.2025.3572027
How Do OSS Developers Reuse Architectural Solutions From Q&A Sites: An Empirical Study
  • Jul 1, 2025
  • IEEE Transactions on Software Engineering
  • Musengamana Jean De Dieu + 2 more

Developers reuse programming-related knowledge (e.g., code snippets) on Q&A sites (e.g., Stack Overflow) that functionally matches the programming problems they encounter in their development. Despite extensive research on Q&A sites, being a high-level and important type of development-related knowledge, architectural solutions (e.g., architecture tactics) and their reuse are rarely explored. To fill this gap, we conducted a mixed-methods study that includes a mining study and a survey study. For the mining study, we mined 984 commits and issues (i.e., 821 commits and 163 issues) from 893 Open-Source Software (OSS) projects on GitHub that explicitly referenced architectural solutions from Stack Overflow (SO) and Software Engineering Stack Exchange (SWESE). For the survey study, we identified practitioners involved in the reuse of these architectural solutions and surveyed 227 of them to further understand how practitioners reuse architectural solutions from Q&A sites in their OSS development. Our main findings are that: (1) OSS practitioners reuse architectural solutions from Q&A sites to solve a large variety (15 categories) of architectural problems, wherein <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Component design issue</i>, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Architectural anti-pattern</i>, and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Security issue</i> are dominant; (2) Seven categories of architectural solutions from Q&A sites have been reused to solve those problems, among which <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Architectural refactoring</i>, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Use of frameworks</i>, and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Architectural tactic</i> are the three most reused architectural solutions; (3) OSS developers often rely on ad hoc ways (e.g., informal, improvised, or unstructured approaches) to reuse architectural solutions from SO, drawing on personal experience and intuition rather than standardized or systematic practices; (4) Reusing architectural solutions from SO comes with a variety of challenges, e.g., OSS practitioners complain that they need to spend significant time to adapt such architectural solutions to address design concerns raised in their OSS development, and it is challenging to reuse architectural solutions that are not tailored to the design context of their OSS projects. Our findings pave the way for future research directions, including the design and development of approaches and tools (such as IDE plugin tools) to facilitate the reuse of architectural solutions from Q&A sites, and could also be used to offer guidelines to practitioners when they contribute architectural solutions to Q&A sites. Our dataset is publicly available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://doi.org/10.5281/zenodo.10936098</uri>.

  • Research Article
  • Cite Count Icon 98
  • 10.1007/s10606-006-9020-5
A Methodological Framework for Socio-Cognitive Analyses of Collaborative Design of Open Source Software
  • Jun 1, 2006
  • Computer Supported Cooperative Work (CSCW)
  • Warren Sack + 5 more

Open Source Software (OSS) development challenges traditional software engineering practices. In particular, OSS projects are managed by a large number of volunteers, working freely on the tasks they choose to undertake. OSS projects also rarely rely on explicit system-level design, or on project plans or schedules. Moreover, OSS developers work in arbitrary locations and collaborate almost exclusively over the Internet, using simple tools such as email and software code tracking databases (e.g. CVS). All the characteristics above make OSS development akin to weaving a tapestry of heterogeneous components. The OSS design process relies on various types of actors: people with prescribed roles, but also elements coming from a variety of information spaces (such as email and software code). The objective of our research is to understand the specific hybrid weaving accomplished by the actors of this distributed, collective design process. This, in turn, challenges traditional methodologies used to understand distributed software engineering: OSS development is simply too “fibrous” to lend itself well to analysis under a single methodological lens. In this paper, we describe the methodological framework we articulated to analyze collaborative design in the Open Source world. Our framework focuses on the links between the heterogeneous components of a project’s hybrid network. We combine ethnography, text mining, and socio-technical network analysis and visualization to understand OSS development in its totality. This way, we are able to simultaneously consider the social, technical, and cognitive aspects of OSS development. We describe our methodology in detail, and discuss its implications for future research on distributed collective practices.

  • Research Article
  • Cite Count Icon 6
  • 10.1145/3690632
Systematic Literature Review of Commercial Participation in Open Source Software
  • Jan 20, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Xuetao Li + 5 more

Open source software (OSS) has been playing a fundamental role in not only information technology but also our social lives. Attracted by various advantages of OSS, increasing commercial companies are participating extensively in open source development, and this has had a broad impact. Enormous research efforts have been devoted to understanding this phenomenon and trying to pursue a win-win result. To characterize the current research achievement and identify challenges, this article provides a comprehensive systematic literature review (SLR) of existing research on company participation in OSS. We collected 105 papers and organized them based on their research topics, which cover three main directions, i.e., participation motivation, contribution model, and impact on OSS development. We found that companies have diverse motivations from economic, technological, and social aspects, and no one study covered all the motivation categories. Existing studies categorize five main companies’ contribution models in OSS projects through their objectives and how they shape OSS communities. Researchers also explored how commercial participation affects OSS development, including companies, developers, and OSS projects. This study contributes to a comprehensive understanding of commercial participation in OSS development. Based on our findings, we present a set of research challenges and promising directions for companies’ better participation in OSS.

  • Conference Article
  • Cite Count Icon 55
  • 10.1109/iceccs.2014.26
How Do Open Source Communities Document Software Architecture: An Exploratory Survey
  • Aug 1, 2014
  • Wei Ding + 4 more

Software architecture (SA) documentation provides a blueprint of a software-intensive system for the communication between stakeholders about the high-level design of the system. In open source software (OSS) development, a lack of SA documentation may hinder the use and further development of OSS, but how much 'architecture' documentation is enough and appropriate is largely dependent on the contextual factors of development. In order to understand the state of the practice of SA documentation in OSS projects, we conducted a documentation-based survey to explore how SA is documented in OSS projects. Out of 2,000 OSS projects from four major OSS sources, we found that 108 projects have some SA documentation, which shows that the SA documentation is scarce in OSS development. We analyzed these 108 projects to understand what SA information has been documented and how they have been described. We have found that frequently-documented architectural information is model, system, and mission, natural language is the most frequently-used architectural language for specifying architectural information in OSS SA documents. The results also show that the likelihood that an OSS project will document SA is increased when more developers are involved in the project, and industry and research OSS projects are more likely to create SA documents than freelance projects.

  • Research Article
  • Cite Count Icon 14
  • 10.1016/j.jss.2021.111035
Architecture information communication in two OSS projects: The why, who, when, and what
  • Jul 12, 2021
  • Journal of Systems and Software
  • Tingting Bi + 3 more

Architecture information communication in two OSS projects: The why, who, when, and what

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 60
  • 10.3390/info10100309
Barriers Faced by Women in Software Development Projects
  • Oct 9, 2019
  • Information
  • Edna Dias Canedo + 4 more

Computer science is a predominantly male field of study. Women face barriers while trying to insert themselves in the study of computer science. Those barriers extend to when women are exposed to the professional area of computer science. Despite decades of social fights for gender equity in Science, Technology, Engineering, and Mathematics (STEM) education and in computer science in general, few women participate in computer science, and some of the reasons include gender bias and lack of support for women when choosing a computer science career. Open source software development has been increasingly used by companies seeking the competitive advantages gained by team diversity. This diversification of the characteristics of team members includes, for example, the age of the participants, the level of experience, education and knowledge in the area, and their gender. In open source software projects women are underrepresented and a series of biases are involved in their participation. This paper conducts a systematic literature review with the objective of finding factors that could assist in increasing women’s interest in contributing to open source communities and software development projects. The main contributions of this paper are: (i) identification of factors that cause women’s lack of interest (engagement), (ii) possible solutions to increase the engagement of this public, (iii) to outline the profile of professional women who are participating in open source software projects and software development projects. The main findings of this research reveal that women are underrepresented in software development projects and in open source software projects. They represent less than 10% of the total developers and the main causes of this underrepresentation may be associated with their workplace conditions, which reflect male gender bias.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/hicss.2005.474
Open Source Software Development: Minitrack Introduction
  • Jan 3, 2005
  • K Crowston + 1 more

In its first year, the minitrack on Open Source Software (OSS) Development will provide a forum for the presentation and discussion of a fascinating and increasingly important mode of software development. OSS is a broad term used to embrace software that is developed and released under some sort of “open source” license. There are thousands of OSS projects, spanning a range of applications, operating system (e.g, Linux, BSD), Internet infrastructure (e.g., the Apache Web Server, sendmail, bind), user applications (e.g., the GIMP, OpenOffice), programming languages (e.g., Perl, Python, gcc) and games (e.g., Paradise). A key feature of OSS development is the participation of a community of developers and active users primarily via the Internet. This mode of interaction creates new challenges to software development, as team members work in a distributed environment and often as volunteers rather than employees. The empirical literature on software engineering, programmers and the social and technical aspects of software development suggests that such teams would face insurmountable difficulties in developing code, yet in fact some of these teams have been remarkably successful. Researchers from a variety of disciplines have turned their attention to the phenomenon of OSS as an intriguing and successful form of Internetsupported work. Understanding how these teams work is important because a digital society entails an increased use of Internet-supported distributed teams for a wide range of knowledge work. This minitrack brings together nine papers addressing various aspects of the OSS phenomenon. The minitrack starts with the paper “The Mysteries of Open Source Software: Black and White and Red All Over” by Brian Fitzgerald and Par Agerfalk. This paper offers a general discussion of the OSS concept, noting a number of “contradictions, paradoxes and tensions throughout”. The session continues with two papers discussing community issues in OSS project teams in more detail. The first, “Collaboration, Leadership, Control, and Conflict Negotiation in the Netbeans.org Open Source Software Development Community” by Chris Jensen and Walt Scacchi, examines leadership and control sharing across organizations and individuals, in and between communities, using the Netbeans.org community as an example. The second paper, “Contrasting Community Building in Sponsored and Community Founded Open Source Projects” by Joel West and Siobhan O'Mahony, contrasts the lifecycles of two kinds of OSS projects, community-founded vs. spinouts from an organization, and discusses in particular the problems of building a community in the later case. The second session includes three papers that focus on the internal workings of OSS projects. The first, “Effective work practices for FLOSS development: A model and propositions” by Kevin Crowston, Hala Annabi, James Howison and Chengetai Masango, develops a set of propositions about the performance of FLOSS teams based on Hackman’s model of effectiveness of work teams. The second paper, “Discussion of a Large-Scale Open Source Data Collection Methodology” by Michael Hahsler and Stefan Koch, presents a set of research areas that could be studied by collecting data on a large number of open source software projects from a single project repository. The final paper in the session, “A Preliminary Analysis of the Influences of Licensing and Organizational Sponsorship on Success in Open Source Projects” by Katherine J. Stewart, Anthony P. Ammeter and Likoebe M. Maruping, develops a model of the impact of licensing restrictiveness and organizational sponsorship on the popularity and vitality of open source software (OSS) development projects and tests it using data from Freshmeat.net and OSS project home pages. The final session includes two papers that consider relations between projects. The first of these, “A Topological Analysis of the Open Source Software Development Community” by Jin Xu, Yongqin Gao, Scott Christley and Gregory Madey, uses social network data about SourceForge developers to examine the topology and evolution of the OSS development community. The second, “Shifting the Creative Effort: Knowledge Reuse in Open Source Software Development” by Stefan Haefliger and Sebastian Spaeth, examines the forms and extent of knowledge reuse from a sample of six open source software projects. The final paper in the minitrack, “Exploring Usability Discussions in Open Source Development” by Michael B. Twidale and David M. Nichols, examines bug reports from several projects to characterize how developers address and resolve issues concerning user interface and interaction design. These nine papers provide a cross-section of the current state of the research on Open Source Software development. We thank all authors who submitted papers and the reviewers for their contributions to the mini-track.

  • Research Article
  • Cite Count Icon 11
  • 10.1142/s0218539319500220
Productivity Assessment Based on Jump Diffusion Model Considering the Effort Management for OSS Project
  • Jun 30, 2019
  • International Journal of Reliability, Quality and Safety Engineering
  • Yoshinobu Tamura + 2 more

Various open source software (OSS) projects are in action around the world. Many OSS are developed and maintained under these OSS projects. Considering the characteristics of OSS, the operation performance of OSS development will take an irregular fluctuation in the long term of operation, because several developers and many users are closely related to the maintenance of OSS. This paper focuses on the irregular fluctuation of the operation performance of OSS. We apply the jump diffusion process model to the noisy cases in the operation of OSS. In particular, the maintenance effort is estimated by the stochastic differential equation model in terms of OSS project management. Moreover, we discuss the method of maintenance effort management based on jump diffusion process model considering the irregular fluctuation of performance for OSS projects. In particular, we propose the method of productivity assessment based on the proposed jump diffusion models. Thereby, it is helpful for the OSS development managers to understand the effort status of OSS from the standpoint of OSS project management. Also, we analyze actual data to show numerical examples of the proposed method considering the characteristics of OSS projects.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.patrec.2018.10.030
Software expert discovery via knowledge domain embeddings in a collaborative network
  • Oct 26, 2018
  • Pattern Recognition Letters
  • Chaoran Huang + 4 more

Software expert discovery via knowledge domain embeddings in a collaborative network

  • Research Article
  • Cite Count Icon 7
  • 10.1109/tkde.2017.2696535
Scalable Algorithms for CQA Post Voting Prediction
  • Aug 1, 2017
  • IEEE Transactions on Knowledge and Data Engineering
  • Yuan Yao + 3 more

Community Question Answering (CQA) sites, such as Stack Overflow and Yahoo! Answers, have become very popular in recent years. These sites contain rich crowdsourcing knowledge contributed by the site users in the form of questions and answers, and these questions and answers can satisfy the information needs of more users. In this article, we aim at predicting the voting scores of questions/answers shortly after they are posted in the CQA sites. To accomplish this task, we identify three key aspects that matter with the voting of a post, i.e., the non-linear relationships between features and output, the question and answer coupling, and the dynamic fashion of data arrivals. A family of algorithms are proposed to model the above three key aspects. Some approximations and extensions are also proposed to scale up the computation. We analyze the proposed algorithms in terms of optimality, correctness, and complexity. Extensive experimental evaluations conducted on two real data sets demonstrate the effectiveness and efficiency of our algorithms.

  • Conference Article
  • Cite Count Icon 24
  • 10.1109/aswec.2018.00027
Code Reuse in Stack Overflow and Popular Open Source Java Projects
  • Nov 1, 2018
  • Adriaan Lotter + 3 more

Solutions provided in Question and Answer (Q&A) websites such as Stack Overflow are regularly used in Open Source Software (OSS). However, many developers are unaware that both Stack Overflow and OSS are governed by licenses. Hence, developers reusing code from Stack Overflow for their OSS projects may violate licensing agreements if their attributions are not correct. Additionally, if code migrates from one OSS through Stack Overflow to another OSS, then complex licensing issues are likely to exist. Such forms of software reuse also have implications for future software maintenance, particularly where developers have poor understanding of copied code. This paper investigates code reuse between these two platforms (i.e., Stack Overflow and OSS), with the aim of providing insights into this issue. This study mined 151,946 Java code snippets from Stack Overflow, 16,617 Java files from 12 of the top weekly listed projects on SourceForge and GitHub, and 39,616 Java files from the top 20 most popular Java projects on SourceForge. Our analyses were aimed at finding the number of clones (indicating reuse) (a) within Stack Overflow posts, (b) between Stack Overflow and popular Java OSS projects, and (c) between the projects. Outcomes reveal that there was up to 3.3% code reuse within Stack Overflow, while 1.0% of Stack Overflow code was reused in recent popular Java projects and 2.3% in those projects that were more established. Reuse across projects was much higher, accounting for as much as 77.2%. Our outcomes have implication for strategies aimed at introducing strict quality assurance measures to ensure the appropriateness of code reuse, and licensing requirements awareness.

  • Dissertation
  • Cite Count Icon 8
  • 10.11606/t.45.2015.tde-30112015-131552
Supporting newcomers to overcome the barriers to contribute to open source software projects
  • Jan 1, 2015
  • Igor Fábio Steinmacher

Community-based Open Source Software (OSS) projects are generally self-organized and dynamic, receiving contributions from volunteers spread across the globe. These communities' survival, long-term success, and continuity demand a constant influx of newcomers. However, newcomers face many barriers when making their first contribution to an OSS project, leading in many cases to dropouts. Therefore, a major challenge for OSS projects is to provide ways to support newcomers during their first contribution. In this thesis, our goal was to identify and understand the barriers newcomers face and provide appropriate strategies to lower these barriers. Toward this end, we conducted multiple studies, using multiple research methods. To identify the barriers, we used data collected from: semi-structured interviews with 35 developers from 13 different projects; 24 answers to an open questionnaire conducted with OSS developers; feedback from 9 graduate and undergraduate students after they tried to join OSS projects; and 20 primary studies gathered via a systematic literature review. The data was analyzed using Grounded Theory procedures: namely, open and axial coding. Subsequently, the analysis resulted in a preliminary conceptual model composed of 58 barriers grouped into six categories: cultural differences, newcomers' characteristics, reception issues, newcomers' orientation, technical hurdles, and documentation problems. Based on the conceptual model, we developed FLOSScoach, a portal to support newcomers making their first OSS project contribution. To assess the portal, we conducted a study with undergraduate students, relying on qualitative data from diaries, self-efficacy questionnaires, and the Technology Acceptance Model. By applying the model to a practical application and assessing it, we could evaluate and improve the barriers model, changing it according to improvements identified during the conception of the tool, as well as suggestions received from the study participants. The FLOSScoach study results indicate that the portal played an important role guiding newcomers and lowering barriers related to the orientation and contribution process, whereas it was inefficient in lowering technical barriers. We also found that the portal is useful, easy to use, and increased newcomers' confidence to contribute. The main contributions of this thesis are: (i) empirical identification and modeling of barriers faced by OSS project newcomers; and (ii) a portal providing information to support OSS project newcomers.

  • Book Chapter
  • 10.1007/978-1-4471-7503-2_23
OSS Reliability Analysis and Project Effort Estimation Based on Computational Intelligence
  • Jan 1, 2023
  • Shigeru Yamada + 2 more

OSS (open-source software) systems serve as the key components of critical infrastructures in the society. As for the OSS development paradigm, the bug tracking systems are used for software quality management in many OSS projects. It is important to appropriately control the quality for the progress status of OSS project, because the software failure is caused by the poor handling of effort control. In particular, the GUI of OSS will be frequently made a dramatic difference according to the major version upgrade. The changing in GUI of OSS will depend on the development and management effort of OSS in the specified version. Considering the relationship between GUI and OSS development process, the UX/UI design of OSS will change with the procedure of OSS development. This chapter focuses on the method of effort estimation for OSS project. Then, the pixel data and OSS fault big data are analyzed by using the deep learning. Moreover, we discuss the effort assessment method in the development phase by using the effort data.

  • Research Article
  • Cite Count Icon 3
  • 10.1177/0165551518808198
A user ranking algorithm for efficient information management of community sites using spectral clustering and folksonomy
  • Oct 22, 2018
  • Journal of Information Science
  • Abhishek Kumar Singh + 2 more

Community question answering (CQA) sites are the major platform for information sharing where posts are created by users as questions and answers. A large number of posts are created on a day-to-day basis, which raise the problem of information management of these sites. Multiple techniques are suggested in existing research for efficient management of CQA sites. Many of the existing techniques used the user ranking for managing the CQA sites but ignored the tagging data and user subject area. In this article, a user ranking method is derived using spectral clustering for posts management by considering the tagging data of CQA sites. Folksonomy is used to build relationship between tags, posts and users. The proposed method is developed in three stages. In first stage, the folksonomy relation is created and user similarity graph is built with the help of tag frequency-inverse post frequency and text similarity techniques. In the second stage, spectral clustering algorithm is applied on user similarity graph to group the similar users. Finally, in third stage, rank of users is identified from the clusters based on user’s information. The clustered users and rank of the users are generated as the output of the proposed algorithm that can provide a way of efficient information management. The experimental results show that the proposed user ranking algorithm outperforms the other considered ranking algorithms and can be helpful for information management of CQA sites. Some real-life applications of information management in CQA sites using the proposed work are also demonstrated in this article.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant