Abstract

With recent advances in computing power enhancing the ability of social scientists to analyze newly available “big data,” there is a new set of challenges for researchers building cross-national, longitudinal data sets. While scholars should embrace these new sources of data, we must carefully consider cleaning and coding decisions and how these influence our ultimate findings. In this article, we outline five common issues that researchers may face in building large cross-national, longitudinal data sets and suggest strategies for how to address each: (1) country data consistency including births, deaths, splits, unifications, and name changes, (2) longitudinal string matching, (3) identifying different types of missing data, (4) using these types of missing data in developing a theoretically and empirically grounded imputation strategy, and (5) understanding whether systemic change is driven by real world processes or by coding/cleaning choices. We also touch briefly on some general technical and technological considerations when working with large data sets. Throughout, we illustrate issues and strategies with examples drawn from our experience building a cross-national, longitudinal network data set of country-international nongovernmental organization memberships.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.