Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Characterizing architecture related posts and their usefulness in Stack Overflow

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Characterizing architecture related posts and their usefulness in Stack Overflow

Similar Papers
  • Conference Article
  • Cite Count Icon 24
  • 10.1109/icsa.2017.31
Developing an Ontology for Architecture Knowledge from Developer Communities
  • Apr 1, 2017
  • Mohamed Soliman + 2 more

Software architecting is a knowledge-intensive activity. However, obtaining and evaluating the quality of relevant and reusable knowledge (and ensuring that this knowledge is up-to-date) requires significant effort. In this paper, we explore how online developer communities (e.g., Stack Overflow), traditionally used by developers to solve coding problems, can help solve architectural problems. We develop an ontology that covers architectural knowledge concepts in Stack Overflow. The ontology provides a description of architecture-relevant information to represent and structure architectural knowledge in Stack Overflow. The ontology is empirically grounded through qualitative analyses of different Stack Overflow posts, as well as inter-coder reliability tests. Our results show that the architecture knowledge ontology in Stack Overflow captures architecture-relevant information and supports achieving practitioners' requirements and concerns.

  • Research Article
  • Cite Count Icon 63
  • 10.1007/s10664-016-9430-z
The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow
  • Apr 19, 2016
  • Empirical Software Engineering
  • Deheng Ye + 2 more

Programming-specific Q&A sites (e.g., Stack Overflow) are being used extensively by software developers for knowledge sharing and acquisition. Due to the cross-reference of questions and answers (note that users also reference URLs external to the Q&A site. In this paper, URL sharing refers to internal URLs within the Q&A site, unless otherwise stated), knowledge is diffused in the Q&A site, forming a large knowledge network. In Stack Overflow, why do developers share URLs? How is the community feedback to the knowledge being shared? What are the unique topological and semantic properties of the resulting knowledge network in Stack Overflow? Has this knowledge network become stable? If so, how does it reach to stability? Answering these questions can help the software engineering community better understand the knowledge diffusion process in programming-specific Q&A sites like Stack Overflow, thereby enabling more effective knowledge sharing, knowledge use, and knowledge representation and search in the community. Previous work has focused on analyzing user activities in Q&A sites or mining the textual content of these sites. In this article, we present a methodology to analyze URL sharing activities in Stack Overflow. We use open coding method to analyze why users share URLs in Stack Overflow, and develop a set of quantitative analysis methods to study the structural and dynamic properties of the emergent knowledge network in Stack Overflow. We also identify system designs, community norms, and social behavior theories that help explain our empirical findings. Through this study, we obtain an in-depth understanding of the knowledge diffusion process in Stack Overflow and expose the implications of URL sharing behavior for Q&A site design, developers who use crowdsourced knowledge in Stack Overflow, and future research on knowledge representation and search.

  • Research Article
  • Cite Count Icon 21
  • 10.1109/tse.2020.2981898
Contextual Documentation Referencing on Stack Overflow
  • Feb 5, 2020
  • IEEE Transactions on Software Engineering
  • Sebastian Baltes + 2 more

Software engineering is knowledge-intensive and requires software developers to continually search for knowledge, often on community question answering platforms such as Stack Overflow. Such information sharing platforms do not exist in isolation, and part of the evidence that they exist in a broader software documentation ecosystem is the common presence of hyperlinks to other documentation resources found in forum posts. With the goal of helping to improve the information diffusion between Stack Overflow and other documentation resources, we conducted a study to answer the question of how and why documentation is referenced in Stack Overflow threads. We sampled and classified 759 links from two different domains, regular expressions and Android development, to qualitatively and quantitatively analyze the links’ context and purpose, including attribution, awareness, and recommendations. We found that links on Stack Overflow serve a wide range of distinct purposes, ranging from citation links attributing content copied into Stack Overflow, over links clarifying concepts using Wikipedia pages, to recommendations of software components and resources for background reading. This purpose spectrum has major corollaries, including our observation that links to documentation resources are a reflection of the information needs typical to a technology domain. We contribute a framework and method to analyze the context and purpose of Stack Overflow links, a public dataset of annotated links, and a description of five major observations about linking practices on Stack Overflow. Those observations include the above-mentioned purpose spectrum, its interplay with documentation resources and applications domains, and the fact that links on Stack Overflow often lack context in form of accompanying quotes or summaries. We further point to potential tool support to enhance the information diffusion between Stack Overflow and other documentation resources.

  • Research Article
  • Cite Count Icon 4
  • 10.1049/2023/6613434
An Observational Study on React Native (RN) Questions on Stack Overflow (SO)
  • Jan 1, 2023
  • IET Software
  • Luluh Albesher + 2 more

Mobile applications are continuously increasing in prevalence. One of the main challenges in mobile application development is creating cross‐platform applications. To facilitate developing cross‐platform applications, the software engineering community created several solutions, one of which is React Native (RN), which is a popular cross‐platform framework. The software engineering literature demonstrated the effectiveness of Stack Overflow (SO) in providing real‐world perspectives on a variety of technical subjects. Therefore, this study aims to gain a better understanding of the stance of RN on SO. We identified and analyzed 131,620 SO RN‐related questions. Moreover, we observed how the interest toward RN on SO evolves over time. Additionally, we utilized Latent Dirichlet Allocation (LDA) to identify RN‐related topics that are discussed within the questions. Afterward, we utilized a number of proxy measures to estimate the popularity and difficulty of these topics. The results revealed that interest toward RN on SO was generally increasing. Moreover, RN‐related questions revolve around six topics, with the topics of layout and navigation being the most popular and the topic of iOS issues being the most difficult. Software engineering researchers, practitioners, educators, and RN contributors may find the results of this study beneficial in guiding their future RN efforts.

  • Research Article
  • Cite Count Icon 142
  • 10.1016/j.infsof.2017.10.009
How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow
  • Nov 6, 2017
  • Information and Software Technology
  • Fabio Calefato + 2 more

How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow

  • Conference Article
  • Cite Count Icon 90
  • 10.1109/saner.2017.7884629
Stack Overflow: A code laundering platform?
  • Feb 1, 2017
  • Le An + 3 more

Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow.

  • Research Article
  • Cite Count Icon 7
  • 10.1007/s10664-021-10028-y
An exploratory study on the repeatedly shared external links on Stack Overflow
  • Nov 3, 2021
  • Empirical Software Engineering
  • Jiakun Liu + 6 more

On Stack Overflow, users reuse 11,926,354 external links to share the resources hosted outside the Stack Overflow website. The external links connect to the existing programming-related knowledge and extend the crowdsourced knowledge on Stack Overflow. Some of the external links, so-called as repeated external links, can be shared for multiple times. We observe that 82.5% of the link sharing activities (i.e., sharing links in any question, answer, or comment) on Stack Overflow share external resources, and 57.0% of the occurrences of the external links are sharing the repeated external links. However, it is still unclear what types of external resources are repeatedly shared. To help users manage their knowledge, we wish to investigate the characteristics of the repeated external links in knowledge sharing on Stack Overflow. In this paper, we analyze the repeated external links on Stack Overflow. We observe that external links that point to the text resources (hosted in documentation websites, tutorial websites, etc.) are repeatedly shared the most. We observe that: 1) different users repeatedly share the same knowledge in the form of repeated external links, thus increasing the maintenance effort of knowledge (e.g., update invalid links in multiple posts), 2) the same users can repeatedly share the external links for the purpose of promotion, and 3) external links can point to webpages with an overload of information that is difficult for users to retrieve relevant information. Our findings provide insights to Stack Overflow moderators and researchers. For example, we encourage Stack Overflow to centrally manage the commonly occurring knowledge in the form of repeated external links in order to better maintain the crowdsourced knowledge on Stack Overflow.

  • Research Article
  • Cite Count Icon 1
  • 10.1142/s0218194023500274
Understanding the Role of Stack Overflow in Supporting Software Development Tasks: A Research Perspective
  • Jun 26, 2023
  • International Journal of Software Engineering and Knowledge Engineering
  • Wenhua Yang + 1 more

Stack Overflow is a Q&A website that is popular among developers and extensively used in software engineering (SE) research. A significant body of research has examined how Stack Overflow can assist with software development tasks, such as recommending APIs. However, while researchers have recognized the importance of Stack Overflow in SE research related to software development tasks, the specific ways in which it is utilized and the reasons for its widespread usage in research have not been thoroughly explored. To address these knowledge gaps, we conducted the first study to understand the role of Stack Overflow in assisting with SE research regarding software development tasks by systematically examining relevant and high-quality research works. Meanwhile, we carried out a qualitative survey to gain insight into why researchers choose to utilize Stack Overflow in SE research and to solicit suggestions for the better use of Stack Overflow in research. The study identifies trends in the research area, prominent researchers and organizations, and the types of tasks that utilize Stack Overflow in research, with coding and debugging being the most common. Moreover, it examines how Stack Overflow data is utilized in SE research regarding software development tasks, including searching, training models, and mining associations. Our qualitative survey of researchers indicates that the popularity of Stack Overflow stems from its comprehensive explanations of technical topics that are often not found in documentation or manuals. The findings provide a comprehensive understanding of the role of Stack Overflow in SE research regarding software development tasks, and offer actionable implications for both researchers and stakeholders of Stack Overflow to facilitate future research and improvements.

  • Conference Article
  • Cite Count Icon 16
  • 10.1145/3468264.3468582
Characterizing search activities on stack overflow
  • Aug 18, 2021
  • Jiakun Liu + 5 more

To solve programming issues, developers commonly search on Stack Overflow to seek potential solutions. However, there is a gap between the knowledge developers are interested in and the knowledge they are able to retrieve using search engines. To help developers efficiently retrieve relevant knowledge on Stack Overflow, prior studies proposed several techniques to reformulate queries and generate summarized answers. However, few studies performed a large-scale analysis using real-world search logs. In this paper, we characterize how developers search on Stack Overflow using such logs. By doing so, we identify the challenges developers face when searching on Stack Overflow and seek opportunities for the platform and researchers to help developers efficiently retrieve knowledge. To characterize search activities on Stack Overflow, we use search log data based on requests to Stack Overflow's web servers. We find that the most common search activity is reformulating the immediately preceding queries. Related work looked into query reformulations when using generic search engines and found 13 types of query reformulation strategies. Compared to their results, we observe that 71.78% of the reformulations can be fitted into those reformulation strategies. In terms of how queries are structured, 17.41% of the search sessions only search for fragments of source code artifacts (e.g., class and method names) without specifying the names of programming languages, libraries, or frameworks. Based on our findings, we provide actionable suggestions for Stack Overflow moderators and outline directions for future research. For example, we encourage Stack Overflow to set up a database that includes the relations between all computer programming terminologies shared on Stack Overflow, e.g., method name, data structure name, design pattern, and IDE name. By doing so, Stack Overflow could improve the performance of search engines by considering related programming terminologies at different levels of granularity.

  • Research Article
  • Cite Count Icon 60
  • 10.1108/dta-07-2017-0054
A survey on mining stack overflow: question and answering (Q&A) community
  • Feb 9, 2018
  • Data Technologies and Applications
  • Arshad Ahmad + 3 more

PurposeSoftware developers extensively use stack overflow (SO) for knowledge sharing on software development. Thus, software engineering researchers have started mining the structured/unstructured data present in certain software repositories including the Q&A software developer community SO, with the aim to improve software development. The purpose of this paper is show that how academics/practitioners can get benefit from the valuable user-generated content shared on various online social networks, specifically from Q&A community SO for software development.Design/methodology/approachA comprehensive literature review was conducted and 166 research papers on SO were categorized about software development from the inception of SO till June 2016.FindingsMost of the studies revolve around a limited number of software development tasks; approximately 70 percent of the papers used millions of posts data, applied basic machine learning methods, and conducted investigations semi-automatically and quantitative studies. Thus, future research should focus on the overcoming existing identified challenges and gaps.Practical implicationsThe work on SO is classified into two main categories; “SO design and usage” and “SO content applications.” These categories not only give insights to Q&A forum providers about the shortcomings in design and usage of such forums but also provide ways to overcome them in future. It also enables software developers to exploit such forums for the identified under-utilized tasks of software development.Originality/valueThe study is the first of its kind to explore the work on SO about software development and makes an original contribution by presenting a comprehensive review, design/usage shortcomings of Q&A sites, and future research challenges.

  • Conference Article
  • Cite Count Icon 225
  • 10.1109/msr.2013.6624015
Answering questions about unanswered questions of Stack Overflow
  • May 1, 2013
  • Muhammad Asaduzzaman + 3 more

Community-based question answering services accumulate large volumes of knowledge through the voluntary services of people across the globe. Stack Overflow is an example of such a service that targets developers and software engineers. In general, questions in Stack Overflow are answered in a very short time. However, we found that the number of unanswered questions has increased significantly in the past two years. Understanding why questions remain unanswered can help information seekers improve the quality of their questions, increase their chances of getting answers, and better decide when to use Stack Overflow services. In this paper, we mine data on unanswered questions from Stack Overflow. We then conduct a qualitative study to categorize unanswered questions, which reveals characteristics that would be difficult to find otherwise. Finally, we conduct an experiment to determine whether we can predict how long a question will remain unanswered in Stack Overflow.

  • Research Article
  • Cite Count Icon 1
  • 10.1049/sfw2/1905538
An Observational Study on Flask Web Framework Questions on Stack Overflow (SO)
  • Jan 1, 2024
  • IET Software
  • Luluh Albesher + 1 more

Web‐based applications are popular in demand and usage. To facilitate the development of web‐based applications, the software engineering community developed multiple web application frameworks, one of which is Flask. Flask is a popular web framework that allows developers to speed up and scale the development of web applications. A review of the software engineering literature revealed that the Stack Overflow (SO) website has proven its effectiveness in providing a better understanding of multiple subjects within the software engineering field. This study aims to analyze SO Flask‐related questions to gain a better understanding of the stance of Flask on the website. We identified a set of 70,230 Flask‐related questions that we further analyzed to estimate how the interest towards the framework evolved over time on the website. Afterward, we utilized the Latent Dirichlet Allocation (LDA) algorithm to identify Flask‐related topics that are discussed within the set of the identified questions. Moreover, we leveraged a number of proxy measures to examine the difficulty and popularity of the identified topics. The study found that the interest towards Flask has been generally increasing on the website, with a peak in 2020 and drops in the following years. Moreover, Flask‐related questions on SO revolve around 12 topics, where Application Programming Interface (API) can be considered the most popular topic and background tasks can be considered the most difficult one. Software engineering researchers, practitioners, educators, and Flask contributors may find this study useful in guiding their future Flask‐related endeavors.

  • Research Article
  • Cite Count Icon 41
  • 10.1016/j.infsof.2021.106667
On the value of encouraging gender tolerance and inclusiveness in software engineering communities
  • Nov 1, 2021
  • Information and Software Technology
  • Elijah Zolduoarrati + 1 more

On the value of encouraging gender tolerance and inclusiveness in software engineering communities

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/saner56733.2023.00063
Architecture Decisions in AI-based Systems Development: An Empirical Study
  • Mar 1, 2023
  • Beiqi Zhang + 5 more

Artificial Intelligence (AI) technologies have been developed rapidly, and AI-based systems have been widely used in various application domains with opportunities and challenges. However, little is known about the architecture decisions made in AI-based systems development, which has a substantial impact on the success and sustainability of these systems. To this end, we conducted an empirical study by collecting and analyzing the data from Stack Overflow (SO) and GitHub. More specifically, we searched on SO with six sets of keywords and explored 32 AI-based projects on GitHub, and finally we collected 174 posts and 128 GitHub issues related to architecture decisions. The results show that in AI-based systems development (1) architecture decisions are expressed in six linguistic patterns, among which Solution Proposal and Information Giving are most frequently used, (2) Technology Decision, Component Decision, and Data Decision are the main types of architecture decisions made, (3) Game is the most common application domain among the eighteen application domains identified, (4) the dominant quality attribute considered in architecture decision-making is Performance, and (5) the main limitations and challenges encountered by practitioners in making architecture decisions are Design Issues and Data Issues. Our results suggest that the limitations and challenges when making architecture decisions in AI-based systems development are highly specific to the characteristics of AI-based systems and are mainly of technical nature, which need to be properly confronted.

  • Conference Article
  • 10.1145/3756681.3757002
Bridging AI and Human Knowledge: Towards a Deeper Understanding of Stack Overflow and ChatGPT
  • Jun 17, 2025
  • Aman Swaraj + 1 more

Community-driven forums like Stack Overflow (SO) have long established themselves as the go-to platform for developers seeking online help. Recently, ChatGPT, a powerful AI tool capable of generating high-level code and providing detailed explanations, has emerged as a strong alternative. While both platforms are valuable for developers, determining the best choice for specific use cases remains an open challenge. Although previous studies have examined the comparative merits of these platforms, the datasets used in such evaluations were limited. To bridge this gap, we introduce a four-dimensional benchmark dataset, ‘SEED’, that can facilitate a comprehensive analysis of ChatGPT and Stack Overflow. Our dataset comprises: (i) Developer Sentiments mined from 4161 comments from Reddit and SO meta-discussions, indicating community perceptions of both platforms, along with a manually labeled subset of 1,000 comments capturing developers’ expressed preferences; (ii) 3500 technical questions from SO, their accepted answers, and corresponding ChatGPT-generated responses for Efficacy (accuracy) benchmarking; (iii) An additional 200 deep learning-related SO posts, their accepted answers, and the corresponding ChatGPT answers to evaluate both these platforms on Energy efficiency parameters; (iv) 4,500 ChatGPT code snippets generated using tailor-made prompts designed to mimic SO answers for Detecting AI-code plagiarism. SEED can support diverse applications, including benchmarking AI-generated answers, evaluating energy efficiency in deep learning development, detecting AI plagiarism, and analyzing developer sentiment. By making this dataset publicly available, we lay the seed for advancing the research involving human-AI interaction in software engineering. Our dataset can be accessed at https://github.com/AnonymousResearch173/SEED.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant