The utility of digital trace data to understand how people search for jobs online

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

The utility of digital trace data to understand how people search for jobs online

Similar Papers
  • Research Article
  • 10.23889/ijpds.v10i5.3349
The Value and Challenges of Making Survey and Digital Trace Datasets Available for Open Access
  • Oct 6, 2025
  • International Journal of Population Data Science
  • Riza Battista-Navarro + 4 more

Introduction & BackgroundOver the last two decades, the digital revolution has led to an explosion of new data sources commonly referred to as digital footprint or trace data (DTD). This rapid expansion in digital data sources has pushed survey research into a new era of development that now centres on its linkage with various participant DTD. This culture shift has unlocked a range of novel opportunities for social scientists to access rich new sources of insight into human behaviour which can be used to augment, validate or even replace conventional self-reported survey data. However, when it comes to making such data open access, there remains a critical gap about maintaining respondent anonymity when it comes to openly releasing DTD. Objectives & ApproachThis paper will focus on demonstrating the conceptual and methodological value and challenges in producing anonymised and standardised variables from survey respondents’ digital trace data (DTD). We will do this using existing YouGov datasets collected over two time periods in the US 2020 and 2024, and a third collected in the UK 2022. The US datasets link individual survey responses to their Twitter/X feeds and the UK to their browsing history. All three datasets were designed to address research questions about the effects of digital media consumption and exposure on citizen attitudes and behaviours. This paper aims to establish a standardised and automated process for variable generation which is replicable and can produce anonymised variables from the DTD which can be safely linked to respondent survey data and openly shared with the wider research community. Relevance to Digital FootprintsThe aim of this work is to encourage other researchers working with digital footprint data to consider the ethical and legal implications they face when looking to make their DTD open access. Our work aims to resolve the conflict between open access and data protection, bridging the gap by establishing a process for deriving anonymous unit-level variables which can be released in lieu of the raw DTD. While not designed to be an entirely prescriptive method, this paper strives to inform strategies for making DTD open access and to start the process of creating better standardised practices within the discipline. Conclusions & ImplicationsWhile this paper is still a work in progress, work is underway for variable generation and will result in the creation and release of a standardised procedure for the anonymisation of DTD. These variables will be created for two specific types of DTD: social media and web-browsing data. However, these variables will be translatable to various other types of DTD and this paper will be accompanied by step-by-step code and codebook which can be used by other researchers. This paper will have significant ethical and methodological implications for how researchers working with DTD make their data open access and will hopefully improve transparency and collaboration within the discipline.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.chb.2024.108281
(In)accuracy and convergent validity of daily end-of-day and single-time self-reported estimations of smartphone use among adolescents
  • May 1, 2024
  • Computers in Human Behavior
  • Michał Tkaczyk + 4 more

Understanding the measurement inaccuracy and bias introduced by self-reports of smartphone use is essential for making meaningful inferences about smartphone use and its effects. Evidence for the self-reports of smartphone use in intensive longitudinal studies is largely missing. Based on self-reported and digital trace data from 137 Czech adolescents (41% girls, Mage = 14.95 years), this study examined the accuracy, directional bias, and convergent validity of daily end-of-day and single-time reports of screen time and phone-checking behavior. Overall, the study found considerable discrepancies between self-reported smartphone use and digital trace and low between-person convergent validity for all self-reports considered for the study. Respondents usually reported shorter screen time and lower frequency of phone-checking behavior as compared to digital trace, both in daily and single-time self-reports. The within-person convergent validity between daily reports and digital tracking was low, indicating poor self-reports ability to capture the actual day-to-day fluctuations in smartphone use. This study adds to the existing evidence showing that self-reports based insights into how people use smartphones differ considerably from digital trace data and shows that both person and situational levels contribute to explaining the discrepancy between digital trace and self-report data among adolescents.

  • Research Article
  • Cite Count Icon 1
  • 10.51890/2587-7399-2021-6-2-91-98
Digital trace data as a tool for assessing competencies: the case of the Gazprom neft
  • Jun 30, 2021
  • PROneft’. Proffessional’no o nefti
  • T A Lezina + 2 more

Large companies can use the analysis of employees’ digital trace data to increase the efficiency and objectivity of business processes of assessment of employee competencies. New technologies allow to accumulate data on the activities of employees related to their work performance in the information systems of companies. The results of employees training, protocols of their interaction on professional issues, the results of recruiting procedures form their digital footprints and can be used to regularly assess their professional growth. A significant problem in applying the idea of using digital footprints to assessing competencies is the choice of assessment metrics. At present, there are no described methods of using digital footprints of personnel. The objective of the work is to describe the case of using the digital footprints to assess the level of professional competencies of data science specialists from Gazprom Neft and describe the approach to assessing the professional competencies of employees using their digital data. Gazprom Neft has chosen as the assessment metric the level of competence employee development, which is determined through a set of “activities” of employees confirmed by digital artifacts, information about which is entered into the information system. The method for assessing the professional competencies of employees described in the article, was used as the basis for an approach to assessing competencies using digital data. This approach makes it possible to increase the efficiency of business processes in HR and can be used in companies of various industries and scales. The key advantages of the approach are its universality and objectivity. The results of the research can be used in companies that use a competency-based approach to the assessment of professional competencies of personnel, and form the first step in the development of the theory and practice of using digital traces of employees in company’s management.

  • Research Article
  • 10.13060/csr.2020.015
Digitální stopa: Konec empirické sociologie?
  • Oct 1, 2020
  • Czech Sociological Review
  • Jakub Sedláček

In the 20th century empirical sociology possessed innovative methodological resources that granted it fairly exclusive access to understanding human social life. However, with the advent of digital technologies and increasing migration into the online world, this privilege started to shift into the hands of commercial entities. People of the 21st century now generate data with every step they take (both physical and virtual), and most of the current internet business models are built on the collection, analysis, and commercial utilisation of such data. The 'Digital Trace Data' left behind by billions of online users present an unprecedented opportunity for the study of their behaviour, characteristics, and social interactions. This article seeks to introduce readers to the world of Digital Trace Data and the three main areas in which such data are used: research, commerce, and surveillance. Examples of all three are given to illustrate the potential strengths, weaknesses, and associated risks. The article also seeks to provide warning of a future in which the largest repository of sociological data in history ends up locked behind the doors of commercial enterprises and government institutions.

  • Book Chapter
  • 10.1007/978-3-319-20319-5_3
Twitter in the Analysis of Social Phenomena: An Interpretative Framework
  • Jan 1, 2015
  • Andreas Jungherr

As the use of online services grows and capabilities in data storage as well as analytics keep on rising, researchers become increasingly interested in what digital trace data might tell them about patterns of human behavior. These data sources potentially hold new information on the mechanisms of human interaction and social phenomena. But before we can unlock the potential of these data, we have to establish how they are connected with social phenomena of interest. While there is an increasing body of research linking offline phenomena to patterns in digital trace data, there is surprisingly little research focusing on the mechanisms leading users to interact with online services. This is a serious deficit in the literature, as digital data traces do not offer us a direct view on social phenomena or events; instead, they offer us a view of reality mediated through the interests and behavior of users moving in the constricted, semi-public communication spaces provided by various digital services. In this chapter, I develop a framework for the use of digital trace data in social science. Key to this framework is a realization that the reflection of reality emerging from digital trace data might be biased through various mediating factors leading users to interact with digital services. This chapter will discuss the nature of digital trace data, various approaches for their use in the analysis of social phenomena, and close by introducing a framework for their application in social science.KeywordsSocial PhenomenonOnline ServicePolitical EventTwitter UserDigital MethodThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Research Article
  • 10.1177/19401612251335372
Tracing Knowledge Gaps: Investigating the Influence of Education on News Exposure and Knowledge Using Digital Trace Data
  • May 18, 2025
  • The International Journal of Press/Politics
  • Dominique S Wirz + 3 more

The knowledge gap hypothesis—the assumption that an increasing flow of news on a topic fosters a gap in knowledge between the more and the less educated—has been demonstrated in numerous studies throughout the past 60 years. Knowledge gaps are attributed to individual differences in media selection and information processing capacities. However, it has been difficult to investigate the relative influence of selection and processing with conventional research methods. We used an innovative combination of individual-level digital trace and survey data collected in Switzerland ( n = 403) and Germany ( n = 471) to study the widening of knowledge gaps throughout the communication process. The data were collected at the onset of the COVID-19 pandemic, an extraordinary period of extremely high information inflow on a novel topic. Our analyses show that individuals with lower education use less online news in general and less COVID-19-related news in particular than those with higher education, which results in a difference in knowledge about the origin of COVID-19 (but not on its severity). However, those with lower education do not have a similar share of COVID-19-related news in their news diet, and they learn even more than those with higher education from the COVID-19-related news that they are exposed to. Our study thus suggests that knowledge gaps are predominantly a result of selecting into news use.

  • Research Article
  • Cite Count Icon 93
  • 10.1007/s13524-018-0715-2
Promises and Pitfalls of Using Digital Traces for Demographic Research.
  • Oct 1, 2018
  • Demography
  • Nina Cesare + 4 more

The digital traces that we leave online are increasingly fruitful sources of data for social scientists, including those interested in demographic research. The collection and use of digital data also presents numerous statistical, computational, and ethical challenges, motivating the development of new research approaches to address these burgeoning issues. In this article, we argue that researchers with formal training in demography-those who have a history of developing innovative approaches to using challenging data-are well positioned to contribute to this area of work. We discuss the benefits and challenges of using digital trace data for social and demographic research, and we review examples of current demographic literature that creatively use digital trace data to study processes related to fertility, mortality, and migration. Focusing on Facebook data for advertisers-a novel "digital census" that has largely been untapped by demographers-we provide illustrative and empirical examples of how demographic researchers can manage issues such as bias and representation when using digital trace data. We conclude by offering our perspective on the road ahead regarding demography and its role in the data revolution.

  • Book Chapter
  • 10.1093/oso/9780198899457.003.0007
Processual Shadows
  • Jul 11, 2024
  • Brian T Pentland + 5 more

In many ways, time-stamped digital trace data seem like an ideal resource for research on routine dynamics and other processual phenomena. In other ways, digital traces seem more like shadows of action than direct images of action. We can see that something happened, but it is difficult to say exactly what. This chapter uses data from four dermatology clinics at the University of Rochester Medical Center to examine the strengths and weaknesses of trace data. These data include over 57,000 patient visits. The data allow us to detect changes in routines that went unnoticed by the clinical staff, but do not allow us to estimate basic features of the patient visit, such as waiting time in the examination room. The chapter compares our analysis to Barley’s (1986) classic diachronic analysis of CT scanners in radiology departments. The comparison is instructive since Barley (1986) also used longitudinal data to examine changing work processes in medical clinics, but with ethnographic data rather than digital trace data. The comparison helps clarify some of the advantages, disadvantages, and trade-offs involved in using digital trace data for diachronic processual analysis.

  • Research Article
  • 10.1016/j.jas.2023.105890
Digital formation processes: A high-frequency, large-scale investigation
  • Nov 24, 2023
  • Journal of Archaeological Science
  • Jon Clindaniel + 1 more

Digital formation processes: A high-frequency, large-scale investigation

  • Research Article
  • 10.1080/19312458.2025.2573265
Using hidden Markov models to assess and correct for measurement error in digital trace data
  • Oct 26, 2025
  • Communication Methods and Measures
  • Paulina Pankowska + 3 more

Digital trace data are increasingly used across the social and behavioral sciences. They allow researchers to access large volumes of highly detailed and continuous information. Such scale and speed cannot be achieved when using traditional sources, such as surveys. Digital traces are also believed to overcome some of the limitations that surveys are criticized for. However, while their use undoubtedly presents researchers with new possibilities, it also introduces new quality challenges that have been increasingly acknowledged. Accounting for these limitations is crucial, as they can lead to biased results and incorrect research findings. Therefore, in this paper, we apply hidden Markov models (HMMs) to digital trace data on Facebook use to assess the nature and incidence of error in measures of Facebook use frequency. HMMs are an attractive method that allows for the estimation and correction of error without the availability of (error-free) gold-standard data, if the assumptions regarding the underlying construct of interest and the nature of the error are met. Our results suggest that the measures derived from digital trace data severely underestimate the frequency of Facebook use for a third of our sample, in particular when not all relevant devices are tracked.

  • Research Article
  • Cite Count Icon 5
  • 10.1080/19312458.2022.2037537
Correcting Sample Selection Bias of Historical Digital Trace Data: Inverse Probability Weighting (IPW) and Type II Tobit Model
  • Feb 19, 2022
  • Communication Methods and Measures
  • Chankyung Pak + 2 more

Digital trace data have become one of the central pillars of media research methods. Despite the opportunities for better understanding individual users’ true behaviors in the personalized media environment, many scholars have pointed out the potential for bias in trace data collections, questioning the generalizability of findings based on them. In this study, we propose two statistical bias correction methods–Inverse Probability Weighting (IPW) and Type II Tobit, which are designed to remedy selection bias of inference from digital trace data donated by research participants. Applying these methods to Facebook take-out data, we demonstrate how the correction methods can change estimated effect sizes, which is important for the translation of academic findings into real-world impacts. We conduct two simulation studies, one under fully synthetic and another under partially simulated conditions, and find that Type II Tobit generally provides a more robust and cost-efficient correction method for digital trace data.

  • Single Report
  • Cite Count Icon 3
  • 10.4054/mpidr-wp-2020-024
Analyzing the effect of time in migration measurement using geo-referenced digital trace data
  • May 1, 2020
  • Lee Fiorio + 6 more

Geo-referenced digital trace data offer unprecedented flexibility in migration estimation. Due to their high temporal granularity, many different migration estimates can be generated from the same dataset by changing the definition parameters. Yet despite the growing application of digital trace data to migration research, strategies for taking advantage of their temporal granularity remain largely underdeveloped. In this paper, we provide a general framework for converting digital trace data into estimates of migration transitions and for systematically analyzing their variation along quasi-continuous time-scale, analogous to a survival function. From migration theory,we develop two simple hypotheses regarding how we expect our estimated migration transition functions to behave. We then test our hypotheses on simulated data and empirical data from three different platforms in two internal migration contexts: geo-tagged Tweets and Gowalla check-ins in the U.S., and cell-phone call detail records in Senegal. Our results demonstrate the need for evaluating the internal consistency of migration estimates derived from digital trace data before using them in substantive research. At the same time, however, common patterns across our three empirical datasets point to an emergent research agenda using digital trace data to study the specific functional relationship between estimates of migration and time and how this relationship varies by geography and population characteristics.

  • Research Article
  • Cite Count Icon 10
  • 10.3389/fpsyt.2022.871916
The Feasibility and Utility of Harnessing Digital Health to Understand Clinical Trajectories in Medication Treatment for Opioid Use Disorder: D-TECT Study Design and Methodological Considerations.
  • Apr 29, 2022
  • Frontiers in Psychiatry
  • Lisa A Marsch + 18 more

IntroductionAcross the U.S., the prevalence of opioid use disorder (OUD) and the rates of opioid overdoses have risen precipitously in recent years. Several effective medications for OUD (MOUD) exist and have been shown to be life-saving. A large volume of research has identified a confluence of factors that predict attrition and continued substance use during substance use disorder treatment. However, much of this literature has examined a small set of potential moderators or mediators of outcomes in MOUD treatment and may lead to over-simplified accounts of treatment non-adherence. Digital health methodologies offer great promise for capturing intensive, longitudinal ecologically-valid data from individuals in MOUD treatment to extend our understanding of factors that impact treatment engagement and outcomes.MethodsThis paper describes the protocol (including the study design and methodological considerations) from a novel study supported by the National Drug Abuse Treatment Clinical Trials Network at the National Institute on Drug Abuse (NIDA). This study (D-TECT) primarily seeks to evaluate the feasibility of collecting ecological momentary assessment (EMA), smartphone and smartwatch sensor data, and social media data among patients in outpatient MOUD treatment. It secondarily seeks to examine the utility of EMA, digital sensing, and social media data (separately and compared to one another) in predicting MOUD treatment retention, opioid use events, and medication adherence [as captured in electronic health records (EHR) and EMA data]. To our knowledge, this is the first project to include all three sources of digitally derived data (EMA, digital sensing, and social media) in understanding the clinical trajectories of patients in MOUD treatment. These multiple data streams will allow us to understand the relative and combined utility of collecting digital data from these diverse data sources. The inclusion of EHR data allows us to focus on the utility of digital health data in predicting objectively measured clinical outcomes.DiscussionResults may be useful in elucidating novel relations between digital data sources and OUD treatment outcomes. It may also inform approaches to enhancing outcomes measurement in clinical trials by allowing for the assessment of dynamic interactions between individuals' daily lives and their MOUD treatment response.Clinical Trial RegistrationIdentifier: NCT04535583.

  • Research Article
  • Cite Count Icon 19
  • 10.1016/j.dss.2019.113133
Reflections on quality requirements for digital trace data in IS research
  • Aug 19, 2019
  • Decision Support Systems
  • Gregory Vial

Reflections on quality requirements for digital trace data in IS research

  • Research Article
  • Cite Count Icon 1
  • 10.1080/08923647.2023.2165862
Incorporating Learners’ Digital Trace Data into Self-Regulated Learning Research
  • Jan 20, 2023
  • American Journal of Distance Education
  • Dan Ye

Being able to self-regulate is critical for students to succeed in online learning due to the isolated nature of online learning. With the development of technology and online education, learners’ digital trace data can be collected and used to improve teaching and learning. Based on a comprehensive literature review of existing self-regulated learning models and measurements and related research, this paper proposed a comprehensive theoretical framework for self-regulated learning research in online learning environments by incorporating learners’ digital trace data from learning management systems. The theoretical framework provides a fundamental understanding for interpreting and the use of digital trace data for educational research.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.