Data Science Education – A Scoping Review

Hossana Twinomurinzi,Nkosikhona Msweli ,Tendani Mawela

doi:10.28945/5173

Abstract

Aim/Purpose: This study aimed to evaluate the extant research on data science education (DSE) to identify the existing gaps, opportunities, and challenges, and make recommendations for current and future DSE. Background: There has been an increase in the number of data science programs especially because of the increased appreciation of data as a multidisciplinary strategic resource. This has resulted in a greater need for skills in data science to extract meaningful insights from data. However, the data science programs are not enough to meet the demand for data science skills. While there is growth in data science programs, they appear more as a rebranding of existing engineering, computer science, mathematics, and statistics programs. Methodology: A scoping review was adopted for the period 2010–2021 using six scholarly multidisciplinary databases: Google Scholar, IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, and the AIS Basket of eight journals. The study was narrowed down to 91 research articles and adopted a classification coding framework and correlation analysis for analysis. Contribution: We theoretically contribute to the growing body of knowledge about the need to scale up data science through multidisciplinary pedagogies and disciplines as the demand grows. This paves the way for future research to understand which programs can provide current and future data scientists the skills and competencies relevant to societal needs. Findings: The key results revealed the limited emphasis on DSE, especially in non-STEM (Science, Technology, Engineering, and Mathematics) disciplines. In addition, the results identified the need to find a suitable pedagogy or a set of pedagogies because of the multidisciplinary nature of DSE. Further, there is currently no existing framework to guide the design and development of DSE at various education levels, leading to sometimes inadequate programs. The study also noted the importance of various stakeholders who can contribute towards DSE and thus create opportunities in the DSE ecosystem. Most of the research studies reviewed were case studies that presented more STEM programs as compared to non-STEM. Recommendations for Practitioners: We recommend CRoss Industry Standard Process for Data Mining (CRISP-DM) as a framework to adopt collaborative pedagogies to teach data science. This research implies that it is important for academia, policymakers, and data science content developers to work closely with organizations to understand their needs. Recommendation for Researchers: We recommend future research into programs that can provide current and future data scientists the skills and competencies relevant to societal needs and how interdisciplinarity within these programs can be integrated. Impact on Society: Data science expertise is essential for tackling societal issues and generating beneficial effects. The main problem is that data is diverse and always changing, necessitating ongoing (up)skilling. Academic institutions must therefore stay current with new advances, changing data, and organizational requirements. Industry experts might share views based on their practical knowledge. The DSE ecosystem can be shaped by collaborating with numerous stakeholders and being aware of each stakeholder’s function in order to advance data science internationally. Future Research: The study found that there are a number of research opportunities that can be explored to improve the implementation of DSE, for instance, how can CRISP-DM be integrated into collaborative pedagogies to provide a fully comprehensive data science curriculum?

Full Text