A Study of C/C++ Code Weaknesses on Stack Overflow

Haoxiang Zhang,Heng Li,Shaowei Wang,Ahmed E Hassan,Tse-Hsun Chen

doi:10.1109/tse.2021.3058985

Abstract

Stack Overflow hosts millions of solutions that aim to solve developers’ programming issues. In this crowdsourced question answering process, Stack Overflow becomes a code hosting website where developers actively share its code. However, code snippets on Stack Overflow may contain security vulnerabilities, and if shared carelessly, such snippets can introduce security problems in software systems. In this paper, we empirically study the prevalence of the <i>Common Weakness Enumeration</i> – CWE, in code snippets of C/C++ related answers. We explore the characteristics of <inline-formula><tex-math notation="LaTeX">$Code_w$</tex-math></inline-formula> , i.e., code snippets that have CWE instances, in terms of the types of weaknesses, the evolution of <inline-formula><tex-math notation="LaTeX">$Code_w$</tex-math></inline-formula> , and who contributed such code snippets. We find that: 1) 36 percent (i.e., 32 out of 89) CWE types are detected in <inline-formula><tex-math notation="LaTeX">$Code_w$</tex-math></inline-formula> on Stack Overflow. Particularly, CWE-119, i.e., <i>improper restriction of operations within the bounds of a memory buffer</i> , is common in both answer code snippets and real-world software systems. Furthermore, the proportion of <inline-formula><tex-math notation="LaTeX">$Code_w$</tex-math></inline-formula> doubled from 2008 to 2018 after normalizing by the total number of C/C++ snippets in each year. 2) In general, code revisions are associated with a reduction in the number of code weaknesses. However, the majority of <inline-formula><tex-math notation="LaTeX">$Code_w$</tex-math></inline-formula> had weaknesses introduced in the first version of the code, and these <inline-formula><tex-math notation="LaTeX">$Code_w$</tex-math></inline-formula> were never revised since then. Only 7.5 percent of users who contributed C/C++ code snippets posted or edited code with weaknesses. Users contributed less code with CWE weakness when they were more active (i.e., they either revised more code snippets or had a higher reputation). We also find that some users tended to have the same CWE type repeatedly in their various code snippets. Our empirical study provides insights to users who share code snippets on Stack Overflow so that they are aware of the potential security issues. To understand the community feedback about improving code weaknesses by answer revisions, we also conduct a qualitative study and find that 62.5 percent of our suggested revisions are adopted by the community. Stack Overflow can perform CWE scanning for all the code that is hosted on its platform. Further research is needed to improve the quality of the crowdsourced knowledge on Stack Overflow.

Full Text