The proliferation of social media platforms has afforded social scientists unprecedented access to vast troves of data on human interactions, facilitating the study of online behavior at an unparalleled scale. These platforms typically structure conversations as threads, forming tree-like structures known as ''discussion trees.'' This paper examines the structural properties of online discussions on Reddit by analyzing both global (community-level) and local (post-level) attributes of these discussion trees. We conduct a comprehensive statistical analysis of a year's worth of Reddit data, encompassing a quarter of a million posts and several million comments. Our primary objective is to disentangle the relative impacts of global and local properties and evaluate how specific features shape discussion tree structures. The results reveal that both local and global features contribute significantly to explaining structural variation in discussion trees. However, local features, such as post content and sentiment, collectively have a greater impact, accounting for a larger proportion of variation in the width, depth, and size of discussion trees. Our analysis also uncovers considerable heterogeneity in the impact of various features on discussion structures. Notably, certain global features play crucial roles in determining specific discussion tree properties. These features include the subreddit's topic, age, popularity, and content redundancy. For instance, posts in subreddits focused on politics, sports, and current events tend to generate deeper and wider discussion trees. This research enhances our understanding of online conversation dynamics and offers valuable insights for both content creators and platform designers. By elucidating the factors that shape online discussions, our work contributes to ongoing efforts to improve the quality and effectiveness of digital discourse.
Read full abstract