A benchmark of Spanish language datasets for computationally driven research

Gustavo Candela,Pilar Escobar,María-Dolores Sáez,Manuel Marco-Such

doi:10.1177/01655515211060530

Gustavo Candela, Pilar Escobar + Show 2 more

Open Access

https://doi.org/10.1177/01655515211060530

Copy DOI

Journal: Journal of information science	Publication Date: Dec 13, 2021
Citations: 2	License type: other-oa

Affiliation: University of Alicante

Abstract

In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes.

Full Text