CodeLabeller: A Web-Based Code Annotation Tool for Java Design Patterns and Summaries

Norman Chen,Yaokun Zheng,Najam Nazar,Chun Yong Chong

doi:10.2139/ssrn.4079399

Norman Chen, Yaokun Zheng + Show 2 more

Open Access

PDF Available

https://doi.org/10.2139/ssrn.4079399

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

With the rise of AI in SE, researchers have shown how AI can be applied to assist software developers in a wide variety of activities. However, it has not been accompanied by a complementary increase in labelled datasets, which is required in many supervised learning methods. Several studies have been using crowdsourcing platforms to collect labelled training data in recent years. However, research has shown that the quality of labelled data is unstable due to participant bias, knowledge variance, and task difficulty. Thus, we present CodeLabeller, a web-based tool that aims to provide a more efficient approach in handling the process of labelling Java source files at scale by improving the data collection process throughout, and improving the degree of reliability of responses by requiring each labeller to attach a confidence rating to each of their responses. We test CodeLabeller by constructing a corpus of over a thousand source files obtained from a large collection of opensource Java projects, and labelling each Java source file with their respective design patterns and summaries. Apart from assisting researchers to crowdsource a labelled dataset, the tool has practical applicability in software engineering education and assists in building expert ratings for software artefacts. This paper discusses the motivation behind the creation of CodeLabeller, the intended users, a tool demonstration and its UI, its implementation, benefits, and lastly, the evaluation through a user study and in-practice usage.

Full Text