Gun violence research is characterized by a dearth of data available for measuring key constructs. Social media data may offer a potential opportunity to significantly reduce that gap, but developing methods for deriving firearms-related constructs from social media data and understanding the measurement properties of such constructs are critical precursors to their broader use. This study aimed to develop a machine learning model of individual-level firearm ownership from social media data and assess the criterion validity of a state-level construct of ownership. We used survey responses to questions on firearm ownership linked with Twitter data to construct different machine learning models of firearm ownership. We externally validated these models using a set of firearm-related tweets hand-curated from the Twitter Streaming application programming interface and created state-level ownership estimates using a sample of users collected from the Twitter Decahose application programming interface. We assessed the criterion validity of state-level estimates by comparing their geographic variance to benchmark measures from the RAND State-Level Firearm Ownership Database. We found that the logistic regression classifier for gun ownership performs the best with an accuracy of 0.7 and an F1-score of 0.69. We also found a strong positive correlation between Twitter-based estimates of gun ownership and benchmark ownership estimates. For states meeting a threshold requirement of a minimum of 100 labeled Twitter users, the Pearson and Spearman correlation coefficients are 0.63 (P<.001) and 0.64 (P<.001), respectively. Our success in developing a machine learning model of firearm ownership at the individual level with limited training data as well as a state-level construct that achieves a high level of criterion validity underscores the potential of social media data for advancing gun violence research. The ownership construct is an important precursor for understanding the representativeness of and variability in outcomes that have been the focus of social media analyses in gun violence research to date, such as attitudes, opinions, policy stances, sentiments, and perspectives on gun violence and gun policy. The high criterion validity we achieved for state-level gun ownership suggests that social media data may be a useful complement to traditional sources of information on gun ownership such as survey and administrative data, especially for identifying early signals of changes in geographic patterns of gun ownership, given the immediacy of the availability of social media data, their continuous generation, and their responsiveness. These results also lend support to the possibility that other computationally derived, social media-based constructs may be derivable, which could lend additional insight into firearm behaviors that are currently not well understood. More work is needed to develop other firearms-related constructs and to assess their measurement properties.
Read full abstract