Abstract

ABSTRACT The proliferation of vast quantities of available datasets that are large and complex in nature has challenged universities to keep up with the demand for graduates trained in both the statistical and the computational set of skills required to effectively plan, acquire, manage, analyze, and communicate the findings of such data. To keep up with this demand, attracting students early on to data science as well as providing them a solid foray into the field becomes increasingly important. We present a case study of an introductory undergraduate course in data science that is designed to address these needs. Offered at Duke University, this course has no prerequisites and serves a wide audience of aspiring statistics and data science majors as well as humanities, social sciences, and natural sciences students. We discuss the unique set of challenges posed by offering such a course, and in light of these challenges, we present a detailed discussion into the pedagogical design elements, content, structure, computational infrastructure, and the assessment methodology of the course. We also offer a repository containing all teaching materials that are open-source, along with supplementary materials and the R code for reproducing the figures found in the article.

Highlights

  • How can we effectively and efficiently teach data science to students with little to no background in computing and statistical thinking? How can we equip them with the skills and tools for reasoning with various types of data and leave them wanting to learn more? This paper describes an introductory data science course that is our answer to these questions.At its core, the course focuses on data acquisition and wrangling, exploratory data analysis, data visualization, inference, modeling, and effective communication of results

  • We offer a repository containing all teaching materials that are opensource, along with supplemental materials and the R code for reproducing the figures p found in the paper. e Keywords: data science curriculum, exploratory data analysis, data c visualization, modeling, reproducibility, R Ac 1 Introduction

  • The course has served as a way to start building bridges between the introductory statistical science and computer science curricula, accelerating the formation of an interdepartmental major in data science, where students are provided an option to build a full undergraduate curriculum in data science but mixing and matching from a list of prescribed courses from the two departments

Read more

Summary

Introduction

How can we effectively and efficiently teach data science to students with little to no background in computing and statistical thinking? How can we equip them with the skills and tools for reasoning with various types of data and leave them wanting to learn more? This paper describes an introductory data science course that is our (working) answer to these questions. We present a synopsis of the course d content and structure of introductory data science courses at four other te institutions with the goal of providing a snapshot of the current state of affairs in undergraduate introductory data science curricula.

Background and related work
Unit 3 - Looking forward
Literate programming and reproducibility with R Markdown
Clean and consistent grammar with the tidyverse
Assessment
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call