Abstract

While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved and processed to remove non-text data, then further refined to create separate datasets, each of which provides unique insights into modes of personal expression on the early Internet. The present paper describes four datasets, all of which were derived from a larger collection of personal websites: (1) a large corpus of raw text data from Geocities personal homepages, (2) a linguistic analysis of basic psychological properties of the same Geocities pages, using an open-source implementation of the Linguistic Inquiry Word Count (LIWC), (3) a dataset of links between homepages (suitable for network analysis), and (4) a manifest dataset summarizing the size and last update date for each file in the dataset. Data from over 378,000 Geocities pages are included. In addition to providing a detailed description of how these datasets were created, I describe how they might be utilized in future research.

Highlights

  • While data from social media are accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability has proved difficult

  • As the Internet matured, different venues for personal expression emerged. One such venue was personal homepages: hypertext documents created by everyday users who wished to create a virtual space for themselves

  • Unlike newsgroups, which were interactive, full-duplex forums of interpersonal communication, personal homepages were unidirectional means of communication: they were created for broadcast distribution to a large audience with little or no means of providing feedback

Read more

Summary

Summary

The early Internet (e.g., content created and posted online during the mid-late 1990s) has received a fair amount of attention within the social-scientific literature. An independent group of digital archivists set out to create an archive of Geocities pages [5] This archive was the basis for the present dataset. Geocities personal homepages can be divided into two broad categories: those created prior 1999 group of digital archivists set out to create an archive of Geocities pages [5] Each personal homepage was assigned respective neighborhoods [6] This structure focused on community and proved to be a uniquean feature independent four-digit number. Homepages, neighborhood had its own community leadership – volunteers who assisted new users and helped followed a much simpler URL structure (“geocities.com/username”), abandoning the neighborhood to grow their respective neighborhoods [6]. A1.visual depiction ofofthe to create createthe thedatasets datasets described in present the present paper

Raw Text Corpus
Psycho-Linguistic Data
Network Dataset
Manifest
Includes
Potential Applications
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call