Abstract

Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be nontraditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students to a variety of techniques to analyze small, neat, and clean datasets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that are considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms. Supplementary materials for this article are available online.[Received June 2014. Revised July 2015.]

Highlights

  • The last decade has brought considerable attention to the field of statistics, as undergraduate enrollments have swollen across the country

  • What we present here is wholly consistent with the vision for the future of the undergraduate statistics curriculum articulated by Horton (2015) and the American Statistical Association Undergraduate Guidelines Workgroup (2014)

  • It is clear that the popularity of data science has brought both opportunities and challenges to the statistics profession

Read more

Summary

A Data Science Course for Undergraduates

Follow this and additional works at: https://scholarworks.smith.edu/mth_facpubs Part of the Statistics and Probability Commons. Benjamin, "A Data Science Course for Undergraduates: Thinking with Data" (2015). Mathematics and Statistics: Faculty Publications, Smith College, Northampton, MA. Mathematics and Statistics: Faculty Publications, Smith College, Northampton, MA. https://scholarworks.smith.edu/mth_facpubs/25

Introduction
Background and Related Work
The Course
Day One
Methods
Data Visualization
Computational Statistics
Machine Learning
Additional Topics
Computing
A Note to Prospective Instructors
Assignments
Epilogue
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.