Which Languages do People Speak on Flickr?

Alireza Koochali,Christian Schulze,Damian Borth,Andreas Dengel,Sebastian Kalkowski

doi:10.1145/2983554.2983560

Alireza Koochali, Christian Schulze + Show 3 more

https://doi.org/10.1145/2983554.2983560

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Recently, the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset was introduced to the computer vision and multimedia research community. This dataset consists of millions of images and videos spread over the globe. This geo-distribution hints at a potentially large set of different languages being used in titles, descriptions, and tags of these images and videos. Since the YFCC100m metadata does not provide any information about the languages used in the dataset, this paper presents the first analysis of this kind. The language and geo-location characteristics of the YFCC100m dataset is described by providing (a) an overview of used languages, (b) language to country associations, and (c) second language usage in this dataset. Being able to know the language spoken in titles, descriptions, and tags, users of the dataset can make language specific decisions to select subsets of images for, e.g., proper training of classifiers or analyze user behavior specific to their spoken language. Also, this language information is essential for further linguistic studies on the metadata of the YFCC100m dataset.

Full Text