Using the random forest classifier to assess and predict student learning of Software Engineering Teamwork

Dragutin Petkovic,Arthur Vigil,Nidhi Miglani,Rainer Todtenhoefer,Shihong Huang Shihong Huang,Marc Sosnick-Perez,Kazunori Okada

doi:10.1109/fie.2016.7757406

Abstract

The overall goal of our Software Engineering Teamwork Assessment and Prediction (SETAP) project is to develop effective machine-learning-based methods for assessment and early prediction of student learning effectiveness in software engineering teamwork. Specifically, we use the Random Forest (RF) machine learning (ML) method to predict the effectiveness of software engineering teamwork learning based on data collected during student team project development. These data include over 100 objective and quantitative Team Activity Measures (TAM) obtained from monitoring and measuring activities of student teams during the creation of their final class project in our joint software engineering classes which ran concurrently at San Francisco State University (SFSU), Fulda University (Fulda) and Florida Atlantic University (FAU). In this paper we provide the first RF analysis results done at SFSU on our full data set covering four years of our joint SE classes. These data include 74 student teams with over 380 students, totaling over 30000 discrete data points. These data are grouped into 11 time intervals, each measuring important phases of project development during the class (e.g. early requirement gathering and design, development, testing and delivery). We briefly elaborate on the methods of data collection and describe the data itself. We then show prediction results of the RF analysis applied to this full data set. Results show that we are able to detect student teams who are bound to fail or need attention in early class time with good (about 70%) accuracy. Moreover, the variable importance analysis shows that the features (TAM measures) with high predictive power make intuitive sense, such as late delivery/late responses, time used to help each other, and surprisingly statistics on commit messages to the code repository, etc. In summary, we believe we demonstrate the viability of using ML on objective and quantitative team activity measures to predict student learning of software engineering teamwork, and point to easy-to-measure factors that can be used to guide educators and software engineering managers to implement early intervention for teams bound to fail. Details about the project and the complete ML training database are downloadable from the project web site.

Full Text