Supporting the Task-driven Skill Identification in Open Source Project Issue Tracking Systems

Fabio Santos

doi:10.1145/3573074.3573088

Abstract

[Background] Selecting an appropriate task is challenging for contributors to Open Source Software (OSS), mainly for those who are contributing for the first time. Therefore, researchers and OSS projects have proposed various strategies to aid newcomers, including labeling tasks. [Aims] In this research, we investigate the automatic labeling of open issues strategy to help the contributors to pick a task to contribute. We label the issues with APIdomains- categories of APIs parsed from the source code used to solve the issues. We plan to add social network analysis metrics gathered from the issues conversations as new predictors. By identifying the skills, we claim the contributor candidates should pick a task more suitable to their skill. [Method] We are employing mixed methods. We qualitatively analyzed interview transcripts and the survey's open-ended questions to comprehend the strategies communities use to assist in onboarding contributors and contributors used to pick up an issue. We applied quantitative studies to analyze the relevance of the API-domain labels in a user experiment and compare the strategies' relative importance for diverse contributor roles. We also mined project and issue data from OSS repositories to build the ground truth and predictors able to infer the API-domain labels with comparable precision, recall, and F-measure with the state-of-art. We also plan to use a skill ontology to assist the matching process between contributors and tasks. By quantitatively analyzing the confidence level of the matching instances in ontologies describing contributors' skills and tasks, we might recommend issues for contribution. In addition, we will measure the effectiveness of the API-domain labels by evaluating the issues solving time and the rate among the labeled and unlabelled ones. [Results] So far, the results showed that organizing the issues?which includes assigning labels is seen as an essential strategy for diverse roles in OSS communities. The API-domain labels are relevant, mainly for experienced practitioners. The predicted labels have an average precision of 75.5%. [Conclusions] Labeling the issues with the API-domain labels indicates the skills involved in an issue. The labels represent possible libraries (aggregated into domains) used in the source code related to an issue. By investigating this research topic, we expect to assist the new contributors in finding a task, helping OSS communities to attract and retain more contributors.

Full Text