Abstract

A version control system records changes to a file or set of files over time so that changes can be tracked and specific versions of a file can be recalled later. As such, it is an essential element of a reproducible workflow that deserves due consideration among the learning objectives of statistics courses. This article describes experiences and implementation decisions of four contributing faculty who are teaching different courses at a variety of institutions. Each of these faculty has set version control as a learning objective and successfully integrated one such system (Git) into one or more statistics courses. The various approaches described in the article span different implementation strategies to suit student background, course type, software choices, and assessment practices. By presenting a wide range of approaches to teaching Git, the article aims to serve as a resource for statistics and data science instructors teaching courses at any level within an undergraduate or graduate curriculum.

Highlights

  • Nolan & Temple Lang (2010) promote “version control” as a key topic for statistical analysis, when coordinating work across a team

  • Version control is an important foundation for reproducible workflows, be they collaborative or non-collaborative

  • It forms a necessary part of a reproducible workflow, and deserves due consideration among the learning objectives of statistics and data science courses

Read more

Summary

Introduction

Nolan & Temple Lang (2010) promote “version control” as a key topic for statistical analysis, when coordinating work across a team. Version control is an important foundation for reproducible workflows, be they collaborative (maintaining versions of files that are being modified by teams) or non-collaborative (tracking analysis histories and providing analysis provenance). It forms a necessary part of a reproducible workflow, and deserves due consideration among the learning objectives of statistics and data science courses. We begin by discussing our motivations for identifying version control as a learning objective and provide summaries of courses taught by the four contributing faculty highlighting different implementation strategies chosen based on student audience, course type, software choices, and assessment practices. An Integrated Development Environment (IDE), i.e., a front-end, for R that offers integration with Git. (rstudio.com) A server-based version of RStudio that can be installed for free for academic use by instructors or institutions. (rstudio.com/products/rstudio-server-pro) A cloud-based version of RStudio software on servers provisioned by RStudio. (rstudio.cloud)

Motivation for version control
Method
Common features of the courses
Course description
Tools and implementation
First exposure in class
Workflow
Assessment
A subsequent Data Science course
Other remarks
Students need to see value of these expert-friendly tools
Start slowly and keep it simple
Not one single path
Peer review
Creating portfolios
Automation and workflow
Findings
Closing thoughts
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call