Abstract

We provide a computational exercise suitable for early introduction in an undergraduate statistics or data science course that allows students to “play the whole game” of data science: performing both data collection and data analysis. While many teaching resources exist for data analysis, such resources are not as abundant for data collection given the inherent difficulty of the task. Our proposed exercise centers around student use of Google Calendar to collect data with the goal of answering the question “How do I spend my time?” On the one hand, the exercise involves answering a question with near universal appeal, but on the other hand, the data collection mechanism is not beyond the reach of a typical undergraduate student. A further benefit of the exercise is that it provides an opportunity for discussions on ethical questions and considerations that data providers and data analysts face in today’s age of large-scale internet-based data collection.

Highlights

  • The title of our article refers to the reality that to master a subject, one needs to do more than practice the individual, elemental, and necessary parts

  • Two of the most seminal contributions to modernizing the statistics and data science curriculum, National Academies of Sciences Engineering and Medicine (2018) and Nolan and Temple Lang (2010), both emphasize the importance of teaching and practicing data wrangling. We argue that another such gap in curricula is inadequate treatment of data collection

  • Building toward the goal of having students experience the entire data analysis process, we propose a data collection activity that allows students to mimic the real-life data collection conducted by numerous internet-age organizations in industry, media, government, and academia

Read more

Summary

Introduction

The title of our article refers to the reality that to master a subject, one needs to do more than practice the individual, elemental, and necessary parts. The cyclical nature of the “Understand” portion emphasizes that in many substantive data analysis projects, original models and visualizations need updating, necessitating many iterations through this cycle until a desired outcome can be communicated. It encourages a holistic view of the elements of a typical data science project

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call