SETAP: Software engineering teamwork assessment and prediction using machine learning

Dragutin Petkovic,Kazunori Okada,Lorenzo Flores,Sonai Dubey,Shihong Huang,Swati Arora,Ramasubramanian Sreenivasen,Rainer Todtenhoefer,Marc Sosnick-Perez

doi:10.1109/fie.2014.7044199

Abstract

Effective teaching of teamwork skills in local and globally distributed Software Engineering (SE) teams is recognized as an important part of the education of current and future software engineers. Effective methods for assessment and early prediction of learning effectiveness in SE teamwork are not only a critical part of teaching but also of value in industrial training and project management. This paper presents a novel analytical approach to the assessment and, most importantly, the prediction of learning outcomes in SE teamwork based on data from our joint software engineering class concurrently taught at San Francisco State University (SFSU), Florida Atlantic University (FAU) and Fulda University, Germany (Fulda). Our approach focuses on assessment and prediction of SE teamwork in terms of ability of student teams to apply best SE processes and develop SE products. It differs from existing work in the following aspects: a) it develops and uses only objective and quantitative measures of team activity from multiple sources, such as statistics of student time use, software engineering tool use, and instructor observations; b) it leverages powerful machine learning (ML) techniques applied to team activity measurements to identify quantitative and objective factors which can assess and predict learning of software engineering teamwork skills at the team level. In this paper we provide the following contributions: a) we present in detail for the first time the full team activity measurement data set we developed, consisting of over 40 objective and quantitative measures extracted from student teams working on class projects; b) we present a ML framework which applies the Random Forest (RF) algorithm to the team activity measurements and team outcomes, focusing on predicting teams that are likely to fail; c) we describe in detail our now fully implemented and operational data processing pipeline, consisting of data collection methods from multiple sources, ML training database creation, and ML analysis subsystems; and finally d) we present very preliminary results of ML analysis results based on the data from our joint software engineering classes in Fall 2012, and Spring 2013, with the data from 17 student teams. While our ML training database is currently small, it continuously grows. Our preliminary results, verified with two independent accuracy measures, show that RF is able to predict SE Process and SE Product team performance in intuitively explainable manner.

Full Text