Year
Publisher
Journal
1
Institution
Institution Country
Publication Type
Field Of Study
Topics
Open Access
Language
Filter 1
Year
Publisher
Journal
1
Institution
Institution Country
Publication Type
Field Of Study
Topics
Open Access
Language
Filter 1
Export
Sort by: Relevance
Using State Space Grids to Quantify and Examine Dynamics of Dyadic Conversation

ABSTRACT This paper illustrates how to implement state space grid analysis for analyzing the back-and-forth multi-turn dynamics that manifest in dyadic conversations. In doing so, we contribute to a dynamic dyadic systems perspective that seeks to advance theories about individual and relational antecedents and outcomes of interpersonal communication through the articulation and study of dyadic interaction dynamics. We first review state space grid terminology, data requirements, visualizations, and quantifications, and how the use of state space grid analysis maps to conversation-related research questions. We then illustrate the application of state space grids to the examination of dyadic support conversations; specifically, we demonstrate how this method can be used to derive parameters that operationalize the use of or movement in the state space and to examine between-dyad differences in movement through the state space. Empirically, we found that conversation behavior flexibility was related to characteristics of the dyadic relationship but not to support receiver outcomes, and that our operationalizations of conversation attractors were related to neither relational characteristics nor support receiver outcomes. We conclude with a discussion of theoretical, methodological, and practical issues that researchers can consider when using state space grids in the analysis of conversation dynamics.

Read full abstract
Open Access Just Published
Bootstrapping public entities. Domain-specific NER for public speakers

ABSTRACT Named Entity Recognition (NER) is a supervised machine learning task that finds various applications in automated content analysis, as the identification of entities is vital for understanding public discourse. However, sometimes the standard NER labels are not specific enough for a given domain. We introduce Public Entity Recognition (PER). PER is a domain-specific version of NER, that is trained for five entity types that are common to public discourse: politicians, parties, authorities, media, and journalists. PER can be used for pre-processing documents, in a pipeline with other classifiers or directly for analyzing information in texts. The taxonomy for PER is taken from the database of (German) public speakers and aims at low-threshold integration into computational social science research. We experiment with different training settings, involving weakly supervised training and training on manually annotated data. We evaluate multilingual transformer models of different sizes against rule-based entity matching and find that the models do not only outperform the baseline but also reach competitive absolute scores of around .8 and higher in F1. We further test for generalization and domain adaptation. We show that with only around 100–150 additional sentences, the model can be adapted to new languages.

Read full abstract
Open Access
On Measurement Validity and Language Models: Increasing Validity and Decreasing Bias with Instructions

ABSTRACT Language models like BERT or GPT are becoming increasingly popular measurement tools, but are the measurements they produce valid? Literature suggests that there is still a relevant gap between the ambitions of computational text analysis methods and the validity of their outputs. One prominent threat to validity is hidden biases in the training data, where models learn group-specific language patterns instead of the concept researchers want to measure. This paper investigates to what extent these biases impact the validity of measurements created with language models. We conduct a comparative analysis across nine group types in four datasets with three types of classification models, focusing on the robustness of models against biases and on the validity of their outputs. While we find that all types of models learn biases, the effects on validity are surprisingly small. In particular when models receive instructions as an additional input, they become more robust against biases from the fine-tuning data and produce more valid measurements across different groups. An instruction-based model (BERT-NLI) sees its average test-set performance decrease by only 0.4% F1 macro when trained on biased data and its error probability on groups it has not seen during training increases only by 0.8%.

Read full abstract
Open Access
Googling Politics? Comparing Five Computational Methods to Identify Political and News-related Searches from Web Browser Histories

ABSTRACT Search engines play a crucial role in today’s information environment. Yet, political and news-related (PNR) search engine use remains understudied, mainly due to the lack of suitable measurement methods to identify PNR searches. Existing research focuses on specific events, topics, or news articles, neglecting the broader scope of PNR search. Furthermore, self-reporting issues have led researchers to use browsing history data, but scalable methods for analyzing such data are limited. This paper addresses these gaps by comparing five computational methods to identify PNR searches in browsing data, including browsing sequences, context-enhanced dictionary, Traditional Supervised Machine Learning (SML), Transformer-based SML, and zero-shot classification. Using Dutch Google searches as a test case, we use Dutch browsing history data obtained via data donations in May 2022 linked to surveys (N users = 315; N records = 9,868,209; N searches = 697,359), along with 35.5k manually annotated search terms. The findings highlight substantial variation in accuracy, with some methods being more suited for narrower topics. We recommend a two-step approach, applying zero-shot classification followed by human evaluation. This methodology can inform future empirical research on PNR search engine use.

Read full abstract
Open Access
What’s in a name? The effect of named entities on topic modelling interpretability

ABSTRACT Topic Modelling has established itself as one of the major text-as-data methodologies within the social sciences in general, and in communications science, in particular. The core strength of TM approaches is the fact that it is essentially a 2-in-1 method, as it both generates clusters into which the texts may fall as well as classifies the texts according to the clusters. Previous research has pointed out that pre-processing text corpora is as much a part of text analysis as the latter stages. Named Entity Recognition, however, is not often thought of when pre-processing texts for analysis and has thus far not received much attention in relation to the TM pipeline. If simply retaining or removing stop words can produce different interpretations of the outcomes of TM, retaining or removing NEs also has consequences on outcomes and interpretations. The current paper analyses the effects that removing/retaining NEs has on the interpretability of topic models. Both model statistics and human validation are used to address this issue. The results show differences in topics models trained on corpora with and without NEs. TMs trained on corpora where NEs are removed exhibit different structural characteristics, and, more importantly, are perceived differently by human coders. We attempt to formulate recommendations regarding the pre-processing of NEs in TM applications.

Read full abstract
Open Access