Abstract

Time-to-event analysis is a common occurrence in political science. In recent years, there has been an increased usage of machine learning methods in quantitative political science research. This article advocates for the implementation of machine learning duration models to assist in a sound model selection process. We provide a brief tutorial introduction to the random survival forest (RSF) algorithm and contrast it to a popular predecessor, the Cox proportional hazards model, with emphasis on methodological utility for political science researchers. We implement both methods for simulated time-to-event data and the Power-Sharing Event Dataset (PSED) to assist researchers in evaluating the merits of machine learning duration models. We provide evidence of significantly higher survival probabilities for peace agreements with 3rd party mediated design and implementation. We also detect increased survival probabilities for peace agreements that incorporate territorial power-sharing and avoid multiple rebel party signatories. Further, the RSF, a previously under-used method for analyzing political science time-to event data, provides a novel approach for ranking of peace agreement criteria importance in predicting peace agreement duration. Our findings demonstrate a scenario exhibiting the interpretability and performance of RSF for political science time-to-event data. These findings justify the robust interpretability and competitive performance of the random survival forest algorithm in numerous circumstances, in addition to promoting a diverse, holistic model-selection process for time-to-event political science data.

Highlights

  • In modern statistical methodology, two primary classes of predictive models exist

  • We demonstrate the suitability and flexibility that is gained from expanding time-to-event analysis to algorithmic models within the political science context by a comparison of the Cox proportional hazards and random survival forests (RSF) models [9]

  • To avoid complicating the assessment, we limit the magnitude of most predictor variables’ effects to negligibly low effect sizes, and as such, we focus on detecting the predictive effects of only a few continuous variables and primarily the one categorical variable, all of which are determined by construction to be strongly related to duration

Read more

Summary

Introduction

Two primary classes of predictive models exist. The traditional modeling approach assumes the data fits a stochastic model, whereas the algorithmic modeling approach, commonly referred to as machine learning, assumes no functional. Conflict management data analysis using survival random forests described using their original codebook file. Third party mediation characteristics have been collected and supplemented to this dataset by Chong Chen in 2015. The raw data may be found at the following link: https://dataverse.harvard.edu/ dataset.xhtml?persistentId=doi:10.7910/DVN/ 29657. Since the data is low-dimensional, we have provided our cleaned data as a csv file as a Supporting information file

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call