Broken External Links on Stack Overflow

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Stack Overflow hosts valuable programming-related knowledge with 11,926,354 links that reference to the third-party websites. The links that reference to the resources hosted outside the Stack Overflow websites extend the Stack Overflow knowledge base substantially. However, with the rapid development of programming-related knowledge, many resources hosted on the Internet are not available anymore. Based on our analysis of the Stack Overflow data that was released on Jun. 2, 2019, 14.2% of the links on Stack Overflow are broken links. The broken links on Stack Overflow can obstruct viewers from obtaining desired programming-related knowledge, and potentially damage the reputation of the Stack Overflow as viewers might regard the posts with broken links as obsolete. In this paper, we characterize the broken links on Stack Overflow. 65% of the broken links in our sampled questions are used to show examples, e.g., code examples. 70% of the broken links in our sampled answers are used to provide supporting information, e.g., explaining a certain concept and describing a step to solve a problem. Only 1.67% of the posts with broken links are highlighted as such by viewers in the posts' comments. Only 5.8% of the posts with broken links removed the broken links. Viewers cannot fully rely on the vote scores to detect broken links, as broken links are common across posts with different vote scores. The websites that host resources that can be maintained by their users are referenced by broken links the most on Stack Overflow -- a prominent example of such websites is GitHub. The posts and comments related to the web technologies, i.e., JavaScript, HTML, CSS, and jQuery, are associated with more broken links. Based on our findings, we shed lights for future directions and provide recommendations for practitioners and researchers.

Similar Papers
  • Research Article
  • 10.5281/zenodo.4683732
Broken external links on Stack Overflow
  • Oct 10, 2020
  • arXiv (Cornell University)
  • Jiakun Liu + 6 more

This is the dataset, coding guides, and scripts for our paper: Broken external links on Stack Overflow.

  • Conference Article
  • Cite Count Icon 90
  • 10.1109/saner.2017.7884629
Stack Overflow: A code laundering platform?
  • Feb 1, 2017
  • Le An + 3 more

Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow.

  • Research Article
  • Cite Count Icon 8
  • 10.1145/3691628
A Large-Scale Study of IoT Security Weaknesses and Vulnerabilities in the Wild
  • Jan 20, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Madhu Selvaraj + 1 more

Internet of Things (IoT) is defined as the connection between places and physical objects (i.e., things) over the internet/network via smart computing devices. IoT is a rapidly emerging paradigm that now encompasses almost every aspect of our modern life. As these devices differ from traditional computing, it is important to understand the challenges IoT developers face while implementing proper security measures in their IoT devices. We observed that IoT software developers share solutions to programming questions as code examples on three Stack Exchange Q & A sites: Stack Overflow (SO), Arduino, and Raspberry Pi. Previous research studies found vulnerabilities/weaknesses in C/C++ code examples shared in SO. However, the studies did not investigate C/C++ code examples related to IoT. The studies investigated SO code examples only. In this article, we conduct a large-scale empirical study of all IoT C/C++ code examples shared in the three Stack Exchange sites, i.e., SO, Arduino, and Raspberry Pi. From the 11,329 obtained code snippets from the three sites, we identify 29 distinct Common Weakness Enumeration (CWE) types in 609 snippets. These CWE types can be categorized into eight general weakness categories, and we observe that evaluation, memory, and initialization-related weaknesses are the most common to be introduced by users when posting programming solutions. Furthermore, we find that 39.58% of the vulnerable code snippets contain instances of CWE types that can be mapped to real-world occurrences of those CWE types (i.e., CVE instances). The most number vulnerable IoT code examples was found in Arduino, followed by SO, and Raspberry Pi. Memory type vulnerabilities are on the rise in the sites. For example, from the 3,595 mapped CVE instances, we find that 28.99% result in Denial of Service (DoS) errors, which is particularly harmful for network reliant IoT devices such as smart cars. Our study results can guide various IoT stakeholders to be aware of such vulnerable IoT code examples and to inform IoT researchers during their development of tools that can help prevent developers the sharing of such vulnerable code examples in the sites.

  • Research Article
  • Cite Count Icon 46
  • 10.1016/j.infsof.2020.106277
Mining API usage scenarios from stack overflow
  • Feb 8, 2020
  • Information and Software Technology
  • Gias Uddin + 2 more

Mining API usage scenarios from stack overflow

  • Conference Article
  • Cite Count Icon 63
  • 10.1109/icse.2019.00065
How Reliable is the Crowdsourced Knowledge of Security Implementation?
  • May 1, 2019
  • Mengsu Chen + 4 more

Stack Overflow (SO) is the most popular online Q&A site for developers to share their expertise in solving programming issues. Given multiple answers to certain questions, developers may take the accepted answer, the answer from a person with high reputation, or the one frequently suggested. However, researchers recently observed exploitable security vulnerabilities in popular SO answers. This observation inspires us to explore the following questions: How much can we trust the security implementation suggestions on SO? If suggested answers are vulnerable, can developers rely on the community's dynamics to infer the vulnerability and identify a secure counterpart? To answer these highly important questions, we conducted a study on SO posts by contrasting secure and insecure advices with the community-given content evaluation. We investigated whether SO incentive mechanism is effective in improving security properties of distributed code examples. Moreover, we also traced duplicated answers to assess whether the community behavior facilitates propagation of secure and insecure code suggestions. We compiled 953 different groups of similar security-related code examples and labeled their security, identifying 785 secure answer posts and 644 insecure ones. Compared with secure suggestions, insecure ones had higher view counts (36,508 vs. 18,713), received a higher score (14 vs. 5), and had significantly more duplicates (3.8 vs. 3.0) on average. 34% of the posts provided by highly reputable so-called trusted users were insecure. Our findings show that there are lots of insecure snippets on SO, while the community-given feedback does not allow differentiating secure from insecure choices. Moreover, the reputation mechanism fails in indicating trustworthy users with respect to security questions, ultimately leaving other users wandering around alone in a software security minefield.

  • Conference Article
  • Cite Count Icon 26
  • 10.1109/msr.2019.00042
Python Coding Style Compliance on Stack Overflow
  • May 1, 2019
  • Nikolaos Bafatakis + 6 more

Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p <; 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.

  • Conference Article
  • Cite Count Icon 15
  • 10.1145/3468264.3468582
Characterizing search activities on stack overflow
  • Aug 18, 2021
  • Jiakun Liu + 5 more

To solve programming issues, developers commonly search on Stack Overflow to seek potential solutions. However, there is a gap between the knowledge developers are interested in and the knowledge they are able to retrieve using search engines. To help developers efficiently retrieve relevant knowledge on Stack Overflow, prior studies proposed several techniques to reformulate queries and generate summarized answers. However, few studies performed a large-scale analysis using real-world search logs. In this paper, we characterize how developers search on Stack Overflow using such logs. By doing so, we identify the challenges developers face when searching on Stack Overflow and seek opportunities for the platform and researchers to help developers efficiently retrieve knowledge. To characterize search activities on Stack Overflow, we use search log data based on requests to Stack Overflow's web servers. We find that the most common search activity is reformulating the immediately preceding queries. Related work looked into query reformulations when using generic search engines and found 13 types of query reformulation strategies. Compared to their results, we observe that 71.78% of the reformulations can be fitted into those reformulation strategies. In terms of how queries are structured, 17.41% of the search sessions only search for fragments of source code artifacts (e.g., class and method names) without specifying the names of programming languages, libraries, or frameworks. Based on our findings, we provide actionable suggestions for Stack Overflow moderators and outline directions for future research. For example, we encourage Stack Overflow to set up a database that includes the relations between all computer programming terminologies shared on Stack Overflow, e.g., method name, data structure name, design pattern, and IDE name. By doing so, Stack Overflow could improve the performance of search engines by considering related programming terminologies at different levels of granularity.

  • Research Article
  • Cite Count Icon 54
  • 10.1109/tse.2020.3023664
An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples
  • Sep 4, 2020
  • IEEE Transactions on Software Engineering
  • Morteza Verdi + 5 more

Software developers share programming solutions in Q&A sites like Stack Overflow, Stack Exchange, Android forum, and so on. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 99 vulnerable code snippets categorized into 31 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 99 vulnerable code snippets found in Stack Overflow were reused in a total of 2859 GitHub projects. To help improve the quality of code snippets shared on Stack Overflow, we developed a browser extension that allows Stack Overflow users to be notified for vulnerabilities in code snippets when they see them on the platform.

  • Conference Article
  • Cite Count Icon 12
  • 10.1145/2851613.2851815
From discussion to wisdom
  • Apr 4, 2016
  • Jing Li + 3 more

Stack Overflow has been providing question and answering service for 7 years. It has become a tremendous knowledge repository for developers' thoughts and practices. Hyperlinks in discussion threads of Stack Overflow are essential knowledge entities for programming on the Web, such as a software library, an API documentation, a code example, or a tutorial. Tens of millions of hyperlinks are disseminated in Stack Overflow, while wisdom on what web resources have been highly recognized by the community is implicit in millions of discussion threads. In this paper, we develop the WisLinker framework that extracts knowledge from discussion, then turns knowledge into wisdom by learning through the knowledge dissemination history. With this wisdom, for a specific hyperlink that users are concerned with, WisLinker can recommend web resources highly recognized by the Stack Overflow community. We evaluate the validity of WisLinker in an open-ended setting using Stack Overflow data dump. We also implement a browser extension for live recommendation of web resources while users browse web pages. WisLinker could enable more efficient exploratory search and information discovery of programming-related web resources.

  • Conference Article
  • Cite Count Icon 33
  • 10.1145/3377811.3380430
Demystify official API usage directives with crowdsourced API misuse scenarios, erroneous code examples and patches
  • Jun 27, 2020
  • Xiaoxue Ren + 4 more

API usage directives in official API documentation describe the contracts, constraints and guidelines for using APIs in natural language. Through the investigation of API misuse scenarios on Stack Overflow, we identify three barriers that hinder the understanding of the API usage directives, i.e., lack of specific usage context, indirect relationships to cooperative APIs, and confusing APIs with subtle differences. To overcome these barriers, we develop a text mining approach to discover the crowdsourced API misuse scenarios on Stack Overflow and extract from these scenarios erroneous code examples and patches, as well as related API and confusing APIs to construct demystification reports to help developers understand the official API usage directives described in natural language. We apply our approach to API usage directives in official Android API documentation and android-tagged discussion threads on Stack Overflow. We extract 159,116 API misuse scenarios for 23,969 API usage directives of 3138 classes and 7471 methods, from which we generate the demystification reports. Our manual examination confirms that the extracted information in the generated demystification reports are of high accuracy. By a user study of 14 developers on 8 API-misuse related error scenarios, we show that our demystification reports help developer understand and debug API-misuse related program errors faster and more accurately, compared with reading only plain API usage-directive sentences.

  • Research Article
  • Cite Count Icon 31
  • 10.1145/3550150
I Know What You Are Searching for: Code Snippet Recommendation from Stack Overflow Posts
  • Apr 26, 2023
  • ACM Transactions on Software Engineering and Methodology
  • Zhipeng Gao + 5 more

Stack Overflow has been heavily used by software developers to seek programming-related information. More and more developers use Community Question and Answer forums, such as Stack Overflow, to search for code examples of how to accomplish a certain coding task. This is often considered to be more efficient than working from source documentation, tutorials, or full worked examples. However, due to the complexity of these online Question and Answer forums and the very large volume of information they contain, developers can be overwhelmed by the sheer volume of available information. This makes it hard to find and/or even be aware of the most relevant code examples to meet their needs. To alleviate this issue, in this work, we present a query-driven code recommendation tool, named Que2Code , that identifies the best code snippets for a user query from Stack Overflow posts. Our approach has two main stages: (i) semantically equivalent question retrieval and (ii) best code snippet recommendation. During the first stage, for a given query question formulated by a developer, we first generate paraphrase questions for the input query as a way of query boosting and then retrieve the relevant Stack Overflow posted questions based on these generated questions. In the second stage, we collect all of the code snippets within questions retrieved in the first stage and develop a novel scheme to rank code snippet candidates from Stack Overflow posts via pairwise comparisons. To evaluate the performance of our proposed model, we conduct a large-scale experiment to evaluate the effectiveness of the semantically equivalent question retrieval task and best code snippet recommendation task separately on Python and Java datasets in Stack Overflow. We also perform a human study to measure how real-world developers perceive the results generated by our model. Both the automatic and human evaluation results demonstrate the promising performance of our model, and we have released our code and data to assist other researchers.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.jss.2021.111063
Improved retrieval of programming solutions with code examples using a multi-featured score
  • Aug 12, 2021
  • Journal of Systems and Software
  • Rodrigo F Silva + 5 more

Improved retrieval of programming solutions with code examples using a multi-featured score

  • Conference Article
  • Cite Count Icon 183
  • 10.1145/3180155.3180260
Are code examples on an online Q&amp;A forum reliable?
  • May 27, 2018
  • Tianyi Zhang + 4 more

Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. To reduce manual assessment effort, we design ExampleCheck, an API usage mining framework that extracts patterns from over 380K Java repositories on GitHub and subsequently reports potential API usage violations in Stack Overflow posts. We analyze 217,818 Stack Overflow posts using ExampleCheck and find that 31% may have potential API usage violations that could produce unexpected behavior such as program crashes and resource leaks. Such API misuse is caused by three main reasons---missing control constructs, missing or incorrect order of API calls, and incorrect guard conditions. Even the posts that are accepted as correct answers or upvoted by other programmers are not necessarily more reliable than other posts in terms of API misuse. This study result calls for a new approach to augment Stack Overflow with alternative API usage details that are not typically shown in curated examples.

  • Research Article
  • Cite Count Icon 47
  • 10.1016/j.infsof.2020.106367
PostFinder: Mining Stack Overflow posts to support software developers
  • Jun 25, 2020
  • Information and Software Technology
  • Riccardo Rubei + 4 more

PostFinder: Mining Stack Overflow posts to support software developers

  • Conference Article
  • Cite Count Icon 4
  • 10.1145/3382494.3422165
On the use of C# Unsafe Code Context
  • Oct 5, 2020
  • Ehsan Firouzi + 3 more

Background. C# maintains type safety and security by not allowing direct dangerous pointer arithmetic. To improve performance for special cases, pointer arithmetic is provided via an unsafe context. Programmers can use the C# unsafe keyword to encapsulate a code block, which can use pointer arithmetic. In the Common Language Runtime (CLR), unsafe code is referred to as unverifiable code. It then becomes the responsibility of the programmer to ensure the encapsulated code snippet is not dangerous. Naturally, this raises concern on whether such trust is misused by programmers when they promote the use of C# unsafe context. Aim. We aim to analyze the prevalence and vulnerabilities of share code examples using C# unsafe keyword in Stack Overflow (SO) code sharing platform. Method. By using some regular expressions and manual checks, we extracted C# unsafe code relevant posts from SO and categorized them into some software development scenarios. Results. In the entire SO data dump of September 2018, we find 2,283 C# snippets with the unsafe keyword. Among those posts, 27% of posts are about Image processing, where unsafe codes are mainly used for performance reasons. The second most popular category by 21% of the codes in the posts is used for 'Interoperability' reasons. That is 'unsafe' is used to enable 'Interoperability' between C# managed codes and unmanaged codes. The 'stackalloc' operator is the third category with 9% of unsafe code posts. The stackalloc operator allocates a block of memory on the stack. Since C# 7.2, Microsoft recommends against using 'stackalloc' in unsafe context whenever possible. Manual inspection shows 67 code snippets with dangerous functions that can introduce vulnerability if not used with caution (e.g., buffer overflow). Finally, 35% of 'Interoperability' posts have 'P/Invoke' tag were used outside NativeMethods class, which is in contrast to Microsoft design suggestion. Conclusion. Our study leads to 7 main findings, and these findings show the importance of cautiously using this feature.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant