Data Science Education – A Scoping Review
Aim/Purpose: This study aimed to evaluate the extant research on data science education (DSE) to identify the existing gaps, opportunities, and challenges, and make recommendations for current and future DSE. Background: There has been an increase in the number of data science programs especially because of the increased appreciation of data as a multidisciplinary strategic resource. This has resulted in a greater need for skills in data science to extract meaningful insights from data. However, the data science programs are not enough to meet the demand for data science skills. While there is growth in data science programs, they appear more as a rebranding of existing engineering, computer science, mathematics, and statistics programs. Methodology: A scoping review was adopted for the period 2010–2021 using six scholarly multidisciplinary databases: Google Scholar, IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, and the AIS Basket of eight journals. The study was narrowed down to 91 research articles and adopted a classification coding framework and correlation analysis for analysis. Contribution: We theoretically contribute to the growing body of knowledge about the need to scale up data science through multidisciplinary pedagogies and disciplines as the demand grows. This paves the way for future research to understand which programs can provide current and future data scientists the skills and competencies relevant to societal needs. Findings: The key results revealed the limited emphasis on DSE, especially in non-STEM (Science, Technology, Engineering, and Mathematics) disciplines. In addition, the results identified the need to find a suitable pedagogy or a set of pedagogies because of the multidisciplinary nature of DSE. Further, there is currently no existing framework to guide the design and development of DSE at various education levels, leading to sometimes inadequate programs. The study also noted the importance of various stakeholders who can contribute towards DSE and thus create opportunities in the DSE ecosystem. Most of the research studies reviewed were case studies that presented more STEM programs as compared to non-STEM. Recommendations for Practitioners: We recommend CRoss Industry Standard Process for Data Mining (CRISP-DM) as a framework to adopt collaborative pedagogies to teach data science. This research implies that it is important for academia, policymakers, and data science content developers to work closely with organizations to understand their needs. Recommendation for Researchers: We recommend future research into programs that can provide current and future data scientists the skills and competencies relevant to societal needs and how interdisciplinarity within these programs can be integrated. Impact on Society: Data science expertise is essential for tackling societal issues and generating beneficial effects. The main problem is that data is diverse and always changing, necessitating ongoing (up)skilling. Academic institutions must therefore stay current with new advances, changing data, and organizational requirements. Industry experts might share views based on their practical knowledge. The DSE ecosystem can be shaped by collaborating with numerous stakeholders and being aware of each stakeholder’s function in order to advance data science internationally. Future Research: The study found that there are a number of research opportunities that can be explored to improve the implementation of DSE, for instance, how can CRISP-DM be integrated into collaborative pedagogies to provide a fully comprehensive data science curriculum?
- Conference Article
4
- 10.1109/fie56618.2022.9962532
- Oct 8, 2022
This Innovative-Practice Full Paper presents the curriculum development and our experiences in offering a client-facing consulting course in data science. Data science education has seen rapid growth over the past decade. To provide students with hands-on opportunities to work with real data, many data science programs have advocated for and implemented experiential learning opportunities throughout the curriculum, which has been shown in a wide variety of literature to have many benefits. Most experiential learning opportunities in STEM programs are provided through capstone and engineering design courses; this is becoming increasingly the case in data science programs as well where several universities have developed data science capstone programs in which students work with clients on the client’s real-world data sets. While client-sponsored capstone projects are an exemplar of experiential learning, they may pose major challenges to implement and can be particularly resource-intensive for institutions; this is especially the case in data science where the legalities of data sharing may come with additional hurdles. Because of this, we were motivated to develop a novel client-facing data science consulting course that provides a unique experiential learning scenario to both undergraduate and graduate students while requiring much fewer resources and legalities. In our novel data science consulting course, groups of students work directly with real clients in a consulting clinic setting to provide data science guidance and short-term help with data science challenges. Through this process, students learn about the diversity of real-world problems in data science, how to lead consultations with clients effectively as a team, how to frame data science challenges and research possible solutions, and how to communicate solutions to clients in reports and presentations. We leveraged best practices in consulting courses developed in business school settings to design our course. Additionally, the consulting course serves as a community service initiative whereby researchers, clinicians, non-profit and government workers, and industry professionals benefit from the advice and short-term help provided through consultation. In this paper, we report how our consulting course is set up, how clients from both within and outside the university can seek help at the consulting clinic, and how the structure of the course enables students to have firsthand experience working on many real-world data science problems with clients. Finally, we discuss how student performance is assessed in this course, the lessons learned from offering this course, and recommendations for other data science programs in universities that wish to design similar courses.
- Research Article
- 10.52731/lir.v001.016
- Jan 1, 2022
- IIAI Letters on Institutional Research
In accordance with theAI Strategy 2019 issued by the Cabinet Office, Kyushu Institute of Information Sciences has established and is operating the KIIS Mathematical and Data Science and AI Education Program at the literacy level and the applied basic level. In particular, the literacy level can be completed by all students, and is recommended for all students to acquire the basic data science knowledge necessary after employment.In this paper, we took up the subject of "Exercise of Information Literacy," which is positioned asan introductory course for data science education, and examined the possibility of its application to data science education for existing courses, using the report assignments in the lectures and the results of questionnaires.The results show that,in both departments, the final survey displayedhigher marks than the initial survey, indicating thateach student was able to master the lecture content and basic skills during the lecture.Thisbrought about a certain level of educational effects in both humanities and sciences.Differences in data science backgroundsexistbetween humanities and science students, and it is important to fully take these into account in educational practices.
- Research Article
- 10.3390/educsci15070878
- Jul 9, 2025
- Education Sciences
Critical data literacy (CDL) has emerged as a crucial component in data science education, transcending traditional disciplinary boundaries. Promoting CDL requires collaborative approaches to enhance learners’ skills in data science, going beyond mere quantitative reasoning to encompass a comprehensive understanding of data workflows and tools. Despite the growing literature on CDL, there is still a need to explore how students use data science practices for supporting the learning of CDL throughout a summer-long data science program. Drawing on situative perspectives of learning, we utilize a descriptive case study to address our research question: How do data science practices taught in a classroom setting differ from those enacted in real-world social justice projects? Key findings reveal that while the course focused on abstract principles and basic technical skills, the Food Justice Project provided students with a more applied understanding of data tools, ethics, and exploration. Through the project, students demonstrated a deeper engagement with CDL, addressing real-world issues through detailed data analysis and ethical considerations. This manuscript adds to the literature within data science education and has the potential to bridge the gap between theoretical knowledge and practical application, preparing students to address real-world data science challenges through their coursework.
- Conference Article
1
- 10.1109/weef-gedc54384.2022.9996248
- Nov 27, 2022
In response to the industry demand for data science skills, universities have created new data science degrees and integrated new data science courses into existing degrees. While data science is now being taught at several universities, there is still limited consensus among instructors on the best way to teach data science. Interviews and surveys with data science instructors revealed that they find it difficult to accommodate diverse student cohorts. Students that enrol in data science courses or degrees have differences in background knowledge, are at various stages of their careers, have various levels of commitment and prefer different learning styles. Although the challenges of teaching data science to diverse student cohorts are often stressed, limited methodologies or guidelines have been developed in response. This paper presents the design of a scaffolding framework developed to teach data science programming skills to a diverse student cohort. The scaffolding framework outlined can be used by instructors to design a project-based data science course that progressively challenges the development of data science programming and self-scaffolding skills.
- Conference Article
7
- 10.1145/3304221.3325533
- Jul 2, 2019
Over the past two decades, data science or data analytics degree programs have begun to emerge, reflecting the world's demand for data specialists to make sense of the vast amounts of collected data in the sciences, engineering, business, and other domains. As degree creation has occurred mainly due to demand, ACM and other professional bodies have recently stepped in to provide curricular guidance. However, no \em shared global framework for data science as an academic discipline exists, making growth unfocused and driven by employer demands. More recently, the growth of artificial intelligence has also impacted data science programs. This working group builds on prior efforts and participant experiences to develop a global taxonomy of approaches to data science education and expectations for graduates of data science programs to \em think like data scientists.
- Research Article
40
- 10.1080/10691898.2020.1851159
- Jan 1, 2021
- Journal of Statistics and Data Science Education
In the past 10 years, new data science courses and programs have proliferated at the collegiate level. As faculty and administrators enter the race to provide data science training and attract new students, the road map for teaching data science remains elusive. In 2019, 69 college and university faculty teaching data science courses and developing data science curricula were surveyed to learn about their curricula, computing tools, and challenges they face in their classrooms. Faculty reported teaching a variety of computing skills in introductory data science (albeit fewer computing topics than statistics topics), and that one of the biggest challenges they face is teaching computing to a diverse audience with varying preparation. The ever-evolving nature of data science is a major hurdle for faculty teaching data science courses, and a call for more data science teaching resources was echoed in many responses.
- Research Article
- 10.21900/j.alise.2025.2061
- Oct 3, 2025
- Proceedings of the ALISE Annual Conference
Since 2021, the University of Illinois Urbana-Champaign (UIUC) has been one of the universities hosting the Bolashak International Scholarship. The scholarship aims to prepare scholars and professionals to work on priority sectors of Kazakhstan’s economy (Bolashak International Scholarship, 2025). At UIUC, the Bolashak International Scholarship is coordinated by Global Education and Training (GET), Illinois International. In 2024, GET invited the School of Information Sciences (iSchool) to set up an educational partnership to co-host data science Bolashak fellows. The partnership was implemented through the establishment of one team representing GET and one team representing the iSchool. The GET and the iSchool team collaborate to design a one-year data science program composed of academic events, mentoring, and course audit. The data science program concept is taking into consideration the scholars’ needs and interests, and relies on data science instructors’ expertise, and previous literature on teaching data science and data science programs (Wing, 2019; Brunner, Kim, 2016; Kross, Guo, 2019; Rokem et al., 2015; Tang and Sae-Lim, 2016). Although the implementation of the program faces challenges, such as finding more faculty to participate in the program, a lack of partnership policies and definition of roles, appropriate data science curriculum for one year, the partnership is cooperative and is taking shape as a coalition partnership (Tushnet, 1993; Berliner, 1997). Co-hosting Bolashak fellows has driven a research agenda that integrates research, practice, and policies to create a data science program based upon the iSchool’s unique perspective working at the intersection of people, technology, and information.
- Research Article
- 10.18411/trnio-07-2023-366
- Jan 1, 2023
- ТЕНДЕНЦИИ РАЗВИТИЯ НАУКИ И ОБРАЗОВАНИЯ
Статья представляет собой обзор методологии CRISP-DM (Cross-Industry Standard Process for Data Mining), которая является одной из наиболее широко используемых методологий разработки проектов в области data science. CRISP-DM является структурированным и итеративным процессом, который охватывает все этапы жизненного цикла проекта data science – от понимания бизнес-целей и задач до внедрения и оценки результатов.
- Book Chapter
3
- 10.1007/978-3-030-36178-5_44
- Jan 1, 2020
In recent years, Data Mining has grown significantly in almost every field. Sectors such as banking, insurance, pharmaceuticals and retailing utilize data mining techniques widely to reduce costs, improve research and increase sales. However, large projects are being carried out on this issue and standards on data mining technique are required. In order to respond to this request, it can be said that CRoss-Industry Standard Process for Data Mining (CRISP-DM) is the most important effort. CRISP-DM is used in many studies, grew as an industry standard, and is defined as a series of sequential steps that guide the application of data mining technique. The CRISP-DM reference model for data mining provides an overview of the life cycle of a data mining project and includes the phases, related tasks and outputs of a project. CRISP-DM is an effort to provide industrial standards for DM applications, including business understanding, data understanding, data preparation, modeling, evaluation and deployment steps. The quality and accuracy of each of the CRISP-DM steps related to DM applications used in different fields is very important for the success of the whole project. In this study, firstly the CRISP-DM algorithm and steps are investigated, later, the data related stages of the CRISP-DM usage in a project (data monitoring and evaluation) are examined, and explained by using an example application. In this process, the supplier database based on the CRISP-DM algorithm for Data Mining has been analyzed and processed. The aim of this study is to explain the role of the steps of CRISP-DM, and specially to process, understand and prepare information in the process of data discovery. In this way, it is expected that create awareness about CRISP-DM, and the impact of this process on projects is expected to be clearly understood.
- Research Article
- 10.1162/99608f92.75aed58b
- Oct 29, 2020
- Harvard Data Science Review
Two-year colleges are poised to play a substantial and possibly transformative role in data science and undergraduate data science education. Current two-year college data science programs provide affordable and rigorous certificate and degree experiences that instill data acumen in students who do not seek or do not fit within the traditional four-year college paradigm. Additional two-year college data science certificate and degree programs are certain to develop as conversations continue with four-year colleges regarding matters of transferability, student achievement, and program evolution. Developing a two-year college data science program is not an easy task, but deep discussions of postsecondary data science education will be incomplete if they fail to consider the opportunities that two-year colleges provide. Two-year college data science educators will therefore need to continue to be active participants in the discussions of both the field and its educational practices going forward. In addition to supporting students in their data science programs with a comprehensive approach respectful of both student diversity and local needs, two-year colleges may also have an opportunityâor even an obligationâto effectively instill principles of general data literacy in their broader undergraduate populations. Additional resources, continued professional development, and effective leadership will be required. These ideas are discussed both generally and within the framework of one two-year college program.
- Research Article
1
- 10.1080/26939169.2025.2486656
- May 30, 2025
- Journal of Statistics and Data Science Education
The presence of data science has been profound in the scientific community in almost every discipline. An important part of the data science education expansion has been at the undergraduate level. We conducted a systematic literature review to (a) portray current evidence and knowledge gaps in self-proclaimed undergraduate data science education research and (b) inform policymakers and the data science education community about what educators may encounter when searching for literature using the general keyword “data science education.” While open-access publications that target a broader audience of data science educators and include multiple examples of data science programs and courses are a strength, substantial knowledge gaps remain. The undergraduate data science literature that we identified often lacks empirical data, research questions, and reproducibility. Certain disciplines are less visible. We recommend that we should (a) cherish data science as an interdisciplinary field; (b) adopt a consistent set of keywords/terminology to ensure data science education literature is easily identifiable; (c) prioritize investments in empirical studies.
- Research Article
2
- 10.70725/525425phybup
- Jan 1, 2022
- Journal of Technology and Teacher Education
Data science and computational thinking (CT) skills are important STEM literacies necessary to make informed daily decisions. In elementary schools, particularly in rural areas, there is little instruction and limited research towards understanding and developing these literacies. Using a Research-Practice Partnership model (RPP; Coburn & Penuel, 2016) we conducted multimethod research investigating nine elementary teachers’ perceptions of data science and related curriculum design during professional development (PD). Connected Learning theory, enhanced with Universal Design for Learning, guided ways we assisted teachers in designing the data science curriculum. Findings suggest teachers maintained high levels of interest in data science instruction and CT before and after the PD and increased their self-efficacy towards teaching data science. A thematic analysis revealed how a data science framework guided curriculum design and assisted teachers in defining, understanding, and co-creating the curriculum. During curriculum design, teachers shared the workload among partners, made collaborative design choices, integrated differentiation strategies, and felt confidence towards teaching data science. Identified challenges included locating data sets and the complexity of understanding data science and related software. This study addresses the research gap in data science education for elementary teachers and assists with successful strategies for data science PD and curricular design.
- Conference Article
10
- 10.1109/educon46332.2021.9453997
- Apr 21, 2021
Emerging data driven economy including industry, research and business, requires new types of specialists that are capable to support all stages of the data lifecycle from data production and input to data processing and actionable results delivery, visualisation and reporting, which can be jointly defined as the Data Science professions family. Data Science is becoming a new recognised field of science that leverages the Data Analytics methods with the power of the Big Data technologies and Cloud Computing that both provide a basis for effective use of the data driven research and economy models. Data Science research and education require a multi-disciplinary approach and data driven/centric paradigm shift. Besides core professional competences and knowledge in Data Science, increasing digitalisation of Science and Industry also requires new type of workplace and professional skills that rise the importance of critical thinking, problem solving and creativity required to work in highly automated and dynamic environment. The education and training of the data related professions must reflect all multi-disciplinary knowledge and competences that are required from the Data Science and handling practitioners in modern, data driven research and the digital economy. In modern conditions with the fast technology change and strong skills demand, the Data Science education and training should be customizable and delivered in multiple forms, also providing sufficient lab facilities for practical training. This paper discusses aspects of building customizable and interoperable Data Science curricula for different types of learners and target application domains. The proposed approach is based on using the EDISON Data Science Framework (EDSF) initially developed in the EU funded Project EDISON and currently being maintained by the EDISON Community Initiative.
- Research Article
18
- 10.7717/peerj-cs.441
- Mar 25, 2021
- PeerJ. Computer science
The interdisciplinary field of data science, which applies techniques from computer science and statistics to address questions across domains, has enjoyed recent considerable growth and interest. This emergence also extends to undergraduate education, whereby a growing number of institutions now offer degree programs in data science. However, there is considerable variation in what the field actually entails and, by extension, differences in how undergraduate programs prepare students for data-intensive careers. We used two seminal frameworks for data science education to evaluate undergraduate data science programs at a subset of 4-year institutions in the United States; developing and applying a rubric, we assessed how well each program met the guidelines of each of the frameworks. Most programs scored high in statistics and computer science and low in domain-specific education, ethics, and areas of communication. Moreover, the academic unit administering the degree program significantly influenced the course-load distribution of computer science and statistics/mathematics courses. We conclude that current data science undergraduate programs provide solid grounding in computational and statistical approaches, yet may not deliver sufficient context in terms of domain knowledge and ethical considerations necessary for appropriate data science applications. Additional refinement of the expectations for undergraduate data science education is warranted.
- Conference Article
28
- 10.1145/3287324.3287522
- Feb 22, 2019
The ACM Data Science Task Force was established by the ACM Education Council and tasked with articulating the role of computing discipline-specific contributions to this emerging field. This special session seeks to introduce the work of the ACM Data Science Task Force as well as to engage the SIGCSE community in this effort. Members of the task force will introduce key components of a draft report, including a summary of data science curricular efforts to date, results of ACM academic and industry surveys on data science, as well as the initial articulation of computing competencies for undergraduate programs in data science. This session should be of interest to all SIGCSE attendees, but especially faculty developing college-level curricula in Data Science.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.