Abstract

Digital data from the political sphere is abundant, omnipresent, and more and more directly accessible through the Internet. Project Vote Smart (PVS) is a prominent example of this big public data and covers various aspects of U.S. politics in astonishing detail. Despite the vast potential of PVS’ data for political science, economics, and sociology, it is hardly used in empirical research. The systematic compilation of semi-structured data can be complicated and time consuming as the data format is not designed for conventional scientific research. This paper presents a new tool that makes the data easily accessible to a broad scientific community. We provide the software called pvsR as an add-on to the R programming environment for statistical computing. This open source interface (OSI) serves as a direct link between a statistical analysis and the large PVS database. The free and open code is expected to substantially reduce the cost of research with PVS’ new big public data in a vast variety of possible applications. We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces. The validity of the library is documented based on an illustration involving female representation in local politics. In addition, pvsR facilitates the replication of research with PVS data at low costs, including the pre-processing of data. Similar OSIs are recommended for other big public databases.

Highlights

  • In recent years, the dawn of the new discipline of ‘computational’ social science has been widely discussed

  • We present in a working example how the relatively simple retrieval and aggregation of high-quality data on U.S politics from the Project Vote Smart (PVS) API via pvsR has the potential to replace common survey practices in the field of political science

  • open source interface (OSI) might partly supersede the duplication of big data in journal archives in the cases where the raw data is compiled via an API

Read more

Summary

Introduction

The dawn of the new discipline of ‘computational’ social science has been widely discussed (see, e.g., [1, 2, 3, 4]). We use the term Open Source Interface (hereafter OSI) to describe the API client libraries that are tailored for social science research Such libraries work as well-documented add-ons to statistical software packages such as R and offer well-guided high-level access to data from a web service. With the increase in publicly available big data, there are more studies which base their research on “unique” and “original” datasets While this development is welcome, the new studies are harder to replicate than empirical analyses based on traditional sources from official statistics.

Background
Findings
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call