Abstract The pervasiveness of the internet has given web language use a central role in society. However, the lack of multilingual corpora and scalable methods has led to the focus on English in web language research. To address this gap, the present paper sets itself in the register research tradition and explores French and Swedish web registers from a cross-linguistic angle. Methodologically we combine keyword analysis with multilingual deep learning, suggesting an approach that enables computational comparisons across languages. Specifically, we extract keywords for French and Swedish web registers, then associate the keywords with fastText word embeddings, and finally, cluster these key embeddings. The findings indicate that there are topical and functional clusters, and they are linguistically motivated and multilingual. The same clusters occur within the same registers in both languages pointing to shared topical and functional similarities – the registers are strikingly similar. The dissimilarities, in contrast, indicate that certain registers like Narrative blogs are to some extent different in French and Swedish. Moreover, grammatical specificities such as the location of adjectives explain some dissimilarities.
Read full abstract