Improvements for research data repositories: The case of text spam

Ismael Vázquez,Rosalía Laza,María Novo-Lourés,José Ramón Méndez,Reyes Pavón,David Ruano-Ordás

doi:10.1177/0165551521998636

Abstract

Current research has evolved in such a way scientists must not only adequately describe the algorithms they introduce and the results of their application, but also ensure the possibility of reproducing the results and comparing them with those obtained through other approximations. In this context, public data sets (sometimes shared through repositories) are one of the most important elements for the development of experimental protocols and test benches. This study has analysed a significant number of CS/ML ( Computer Science/ Machine Learning) research data repositories and data sets and detected some limitations that hamper their utility. Particularly, we identify and discuss the following demanding functionalities for repositories: (1) building customised data sets for specific research tasks, (2) facilitating the comparison of different techniques using dissimilar pre-processing methods, (3) ensuring the availability of software applications to reproduce the pre-processing steps without using the repository functionalities and (4) providing protection mechanisms for licencing issues and user rights. To show the introduced functionality, we created STRep (Spam Text Repository) web application which implements our recommendations adapted to the field of spam text repositories. In addition, we launched an instance of STRep in the URL https://rdata.4spam.group to facilitate understanding of this study.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improvements for research data repositories: The case of text spam

Abstract

Talk to us

Similar Papers

More From: Journal of Information Science

Lead the way for us

Journal: Journal of Information Science	Publication Date: Mar 2, 2021
Citations: 2

Similar Papers

A Data-Driven Lens to Understand Human Biology: An Interview with Daphne Koller
Daphne Koller ... Malorye A Branca
GEN Biotechnology | VOL. 1
Daphne Koller, et. al.Daphne Koller ... Malorye A Branca
01 Jun 2022
GEN Biotechnology | VOL. 1

Development and Practical Application of a Multifunctional Test Bench for Experimental Research of Precise Mechatronic Systems
Larisa G Kopylova ... Sergey A Samarinsky
Indian Journal of Science and Technology | VOL. 9
Larisa G Kopylova, et. al.Larisa G Kopylova ... Sergey A Samarinsky
26 Dec 2016
Indian Journal of Science and Technology | VOL. 9

Machine learning in pain research.
Jörn Lötsch ... Alfred Ultsch
Pain | VOL. 159
Jörn Lötsch, et. al.Jörn Lötsch ... Alfred Ultsch
24 Nov 2017
Pain | VOL. 159

A high-performance persistent identification concept
Fatih Berber ... Philipp Wieder
-
Fatih Berber, et. al.Fatih Berber ... Philipp Wieder
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improvements for research data repositories: The case of text spam

Abstract

Talk to us

Similar Papers

More From: Journal of Information Science