Abstract

With recent advances in computing power enhancing the ability of social scientists to analyze newly available “big data,” there is a new set of challenges for researchers building cross-national, longitudinal data sets. While scholars should embrace these new sources of data, we must carefully consider cleaning and coding decisions and how these influence our ultimate findings. In this article, we outline five common issues that researchers may face in building large cross-national, longitudinal data sets and suggest strategies for how to address each: (1) country data consistency including births, deaths, splits, unifications, and name changes, (2) longitudinal string matching, (3) identifying different types of missing data, (4) using these types of missing data in developing a theoretically and empirically grounded imputation strategy, and (5) understanding whether systemic change is driven by real world processes or by coding/cleaning choices. We also touch briefly on some general technical and technological considerations when working with large data sets. Throughout, we illustrate issues and strategies with examples drawn from our experience building a cross-national, longitudinal network data set of country-international nongovernmental organization memberships.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call