Abstract

Increasingly, human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations. In the context of higher education, a growing number of schools and universities collect data on their students with the purpose of assessing or predicting behaviors and academic performance, and the COVID-19–induced move to online education dramatically increases what can be accumulated in this way, raising concerns about students’ privacy. We focus on academic performance and ask whether predictive performance for a given dataset can be achieved with less privacy-invasive, but more task-specific, data. We draw on a unique dataset on a large student population containing both highly detailed measures of behavior and personality and high-quality third-party reported individual-level administrative data. We find that models estimated using the big behavioral data are indeed able to accurately predict academic performance out of sample. However, models using only low-dimensional and arguably less privacy-invasive administrative data perform considerably better and, importantly, do not improve when we add the high-resolution, privacy-invasive behavioral data. We argue that combining big behavioral data with “ground truth” administrative registry data can ideally allow the identification of privacy-preserving task-specific features that can be employed instead of current indiscriminate troves of behavioral data, with better privacy and better prediction resulting.

Highlights

  • Human behavior can be monitored through the collection of data from digital devices revealing information on behaviors and locations

  • I n the field of higher education, the application of digital data on student behavior in and out of classrooms for predictive purposes is known as learning analytics or educational data mining

  • Digital traces of student behavior at the individual level—for example, interaction with digital portals, WiFi use, and locationbased services—can be used to predict key student outcomes such as academic performance and dropout [15,16,17]

Read more

Summary

SOCIAL SCIENCES

Task-specific information outperforms surveillance-style big data in predictive analytics. The key premise of learning analytics is that pervasive data collection and analysis allows for informative predictions about academic behavior and outcomes [1], potentially at the cost of student privacy and agency [2], as highlighted in recent media coverage [3, 4] Such concerns have most recently been amplified with the massive shift toward online education following the COVID-19 pandemic [5,6,7]. In our sample of engineering college students, using the administrative registry data described below, 68.6% of students with a medium-level high school grade point average (GPA) place in the medium GPA range in college This suggests the possibility that predicting academic performance may not require knowledge of sensitive information from browser use or smartphones, but a measure of past performance often used as entry criteria in higher education in the first place.

Wealth of parents
Discussion
Results
Materials and Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call