The Internet Never Forgets: A Four-Step Scraping Tutorial, Codebase, and Database for Longitudinal Organizational Website Data

Richard F.J Haans,Marc J Mertens

doi:10.1177/10944281241284941

Abstract

Websites represent a crucial avenue for organizations to reach customers, attract talent, and disseminate information to stakeholders. Despite their importance, strikingly little work in the domain of organization and management research has tapped into this source of longitudinal big data. In this paper, we highlight the unique nature and profound potential of longitudinal website data and present novel open-source code- and databases that make these data accessible. Specifically, our codebase offers a general-purpose setup, building on four central steps to scrape historical websites using the Wayback Machine. Our open-access CompuCrawl database was built using this four-step approach. It contains websites of North American firms in the Compustat database between 1996 and 2020—covering 11,277 firms with 86,303 firm/year observations and 1,617,675 webpages. We describe the coverage of our database and illustrate its use by applying word-embedding models to reveal the evolving meaning of the concept of “sustainability” over time. Finally, we outline several avenues for future research enabled by our step-by-step longitudinal web scraping approach and our CompuCrawl database.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Internet Never Forgets: A Four-Step Scraping Tutorial, Codebase, and Database for Longitudinal Organizational Website Data

Abstract

Talk to us

Similar Papers

More From: Organizational Research Methods

Lead the way for us

Journal: Organizational Research Methods	Publication Date: Nov 4, 2024
License type: CC BY-NC 4.0

Similar Papers

The Internet Never Forgets: A Four-Step Scraping Tutorial, Codebase, and Database for Longitudinal Organizational Website Data
Richard F.J Haans ... Marc J Mertens
Organizational Research Methods | VOL. -
Richard F.J Haans, et. al.Richard F.J Haans ... Marc J Mertens
04 Nov 2024
Organizational Research Methods | VOL. -

One Size Does Not Fit All: Unraveling Item Response Process Heterogeneity Using the Mixture Dominance-Unfolding Model (MixDUM)
Bo Zhang ... R Philip Chalmers
Organizational Research Methods | VOL. -
Bo Zhang, et. al.Bo Zhang ... R Philip Chalmers
12 Sep 2024
Organizational Research Methods | VOL. -

Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text
Andrew B Speer ... Tobias L Kordsmeyer
Organizational Research Methods | VOL. -
Andrew B Speer, et. al.Andrew B Speer ... Tobias L Kordsmeyer
28 Aug 2024
Organizational Research Methods | VOL. -

Cognitive Task Analysis: Eliciting Expert Cognition in Context
Olivia Brown ... Nicola Power
Organizational Research Methods | VOL. -
Olivia Brown, et. al.Olivia Brown ... Nicola Power
20 Aug 2024
Organizational Research Methods | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Internet Never Forgets: A Four-Step Scraping Tutorial, Codebase, and Database for Longitudinal Organizational Website Data

Abstract

Talk to us

Similar Papers

More From: Organizational Research Methods