Special Considerations for the Acquisition and Wrangling of Big Data

Michael T Braun,Goran Kuljanin,Richard P Deshon

doi:10.1177/1094428117690235

Abstract

Organizational scientists must capitalize on the big data revolution to better understand the nomothetic, idiographic, multilevel, and/or dynamic processes that make up today’s workplace. Simultaneously, researchers must collect high-quality data and be careful, diligent, and deliberate during data wrangling and data analysis so that all results can be replicated and all inferences are appropriate. Unfortunately, big data create many uncommon challenges during data acquisition and data wrangling that must be considered and overcome to fulfill the promise and potential of big data. Specifically, during acquisition, organizational scientists must become familiar with concepts like web scraping and databases, determine how to divide big data files into manageable chunks for cleaning and analysis, all while ensuring not to violate data usage rules and regulations. Likewise, once acquired, to effectively wrangle data so that they are ready for analysis researchers must be able to handle multiple file formats and data encoding standards, utilize a variety of software to visualize and diagnose data structure, and be adept at using functions and algorithms to determine variable structure and evaluate records and variables for missing or erroneous information. The current article provides a concise definition of big data and addresses each of these novel challenges and concepts related to big data acquisition and wrangling, specifically focusing on providing guidance and recommendations. Finally, a detailed big data example, team development using play-by-play basketball data, is provided. Each step of the process of scraping the data from the web as well as wrangling the multilevel big data into tidy data form is discussed, accompanied by a supplemental R file that contains all of the code necessary for researchers to replicate the described procedure.

Full Text