Abstract

Recognizing the current activity of a software developer (e.g., debugging, reading, editing) can improve the effectiveness of recommendation systems that aim to reduce the cognitive load of information lookup during software development. Current recommendation systems based on developer activity detection focus primarily on a single dimension of developer behavior, e.g., both observing and recommending IDE commands or source code classes. In addition, the current state of the art techniques require that the number and type of activities exhibited by developers is pre-specified and that labeled interaction data is provided as input. In this article, we propose the use of an approach that eschews these requirements, leveraging an unsupervised statistical model that uses both IDE commands and source code accesses to discern latent developer activities. Our approach outperforms baseline supervised and unsupervised learning techniques on simulation-based evaluation of source code and command recommendations for most of the configurations we examined. We also show that our technique benefits from observing both commands and code accesses when identifying developer activity.

Highlights

  • As software continues to grow in size and complexity, software developers need to recall numerous and heterogeneous fragments of information to effectively perform their daily work

  • We propose a technique for joint modeling of source code accesses and Integrated Development Environment (IDE) commands that uses both data dimensions to improve the quality of activity detection and the activity-aware recommendation of source code and IDE commands

  • To understand the effectiveness of our model towards activity-aware recommendation systems, we investigate the following questions: RQ1: Can the HDP-Hidden Markov Models (HMMs) activity modeling based on commands and code interactions aid in activity-based recommendation of (a) code elements, and (b) IDE commands?

Read more

Summary

INTRODUCTION

As software continues to grow in size and complexity, software developers need to recall numerous and heterogeneous fragments of information to effectively perform their daily work Such information is often scattered across internal (e.g., program elements, documentation, tests) and external resources (e.g., Q&A forums, tutorials, blog posts). A key strength of the approach described in this article is that it does not require a predetermined, fixed set of activities, instead, allowing for the number of activities to be inferred based on the developers’ interactions. Another advantage of the proposed approach is that it is unsupervised and does not require labeled data, which is hard to obtain at scale and from a sufficiently diverse set of developers.

RELATED WORK
DEVELOPER ACTIVITY MODEL
RESEARCH QUESTIONS
Findings
CONCLUSION AND FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call