Abstract

The <em>By the People</em> (<em>BTP</em>) datasets comprise text of selected collections of the Library of Congress (LOC) created by volunteers in the <em>By the People</em> crowdsourced transcription program, which invites public transcription of historical documents. All transcriptions are created and reviewed by volunteers in a consensus-based model in which two or more volunteers must agree on a transcription for it to be considered complete. Resulting transcriptions are added to the digital collections alongside the images to enable search and accessibility of the collections. Additionally, completed transcription “campaigns” are published as freely downloadable datasets of .CSV files containing all campaign transcriptions, as well as minimal metadata. The datasets can support a multitude of purposes including computational research in fields such as history, linguistics, economics, and political science.

Highlights

  • Seven completed campaigns, including a total of 23,316 transcriptions, have been published as datasets for bulk download, and these are the primary subject of this paper

  • All transcriptions are created and reviewed by volunteers using a consensus model, in which at least two volunteers must agree on a transcription for it to be marked as complete

  • We provide the Rosa Parks10 dataset description here as a representative example of the seven currently available datasets, and template for future releases: This dataset includes: .ZIP file containing a .CSV file and a README file. - rosaparks-in-her-own-words-2021-04-19.csv- a .CSV containing campaign, project, item, itemID, asset, and asset status metadata, as well as an image link, and the volunteer-generated transcription

Read more

Summary

SUMMARY OF CAMPAIGN

Selection from the papers of reformer, poet, editor, and clergyman William Oland Bourne (1819–1901). Includes narratives submitted by disablied Union veterans in a Left-hand Penmanship contest sponsored by Bourne as well as Civil War reminiscences by soldiers and sailors in Central Park Hospital, New York, N.Y. Selections from the papers of Branch Rickey, major league baseball manager and executive, consisting of scouting reports from the 1950s and 1960s. Selections from the papers of Branch Rickey, major league baseball manager and executive, consisting of scouting reports from the 1950s and 1960s They are mostly concentrated in the years 1951–1956 and 1962–1963, while Rickey was associated, respectively, with the Pittsburgh Pirates and St. Louis Cardinals.

SUMMARY OF CAMPAIGN DATASET URL
Findings
(2) METHOD

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.