Abstract
The Internet is highly multilingual, and its content is created, shared, debated and shaped within many different language-speaking communities. These communities do not exist in isolation, but communicate and influence each other's interests, just as in the offline world. Quantifying this influence is however a non-trivial task, as these communities are usually spread across multiple heterogeneous platforms. In this work, we set out to measure the influence of languages on each other by observing concept overlap between the 110 largest Wikipedia language editions. We describe experiments to test if language overlap in concept coverage is a random process, and find that edition size is a strong predictor of higher concept overlap, with English--German being the most frequently co-occurring pair (45%). Both small and large editions co-occur more frequently than expected with editions of similar size, but co-occurrences across groups are below what is expected by chance. Additionally, by applying network analysis, we find that the hierarchy of language interconnections differs depending on the locality of topics: for interlingually popular topics, the dominance of English, German and French is pronounced, while for topics with a local reach, geographical and cultural proximity as well as common heritage are better explanators of co-occurrence.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.