New Trends in Databases and Information Systems

Ciprian-Octavian Truică

doi:10.1007/978-3-319-67162-8

Abstract

Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T${}^2$K${}^2$, which features a real tweet dataset and queries with various complexities and selectivities. T${}^2$K${}^2$ helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T${}^2$K${}^2$'s relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

New Trends in Databases and Information Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

T $$^2$$ K $$^2$$ : The Twitter Top-K Keywords Benchmark
Ciprian-Octavian Truică ... Jérôme Darmont
-
Ciprian-Octavian Truică, et. al.Ciprian-Octavian Truică ... Jérôme Darmont
01 Jan 2017
01 Jan 2017

Benchmarking top-[formula omitted] keyword and top-[formula omitted] document processing with T[formula omitted]K[formula omitted] and T[formula omitted]K[formula omitted]D[formula omitted
Ciprian-Octavian Truică ... Florin Rădulescu
Future Generation Computer Systems | VOL. 85
Ciprian-Octavian Truică, et. al.Ciprian-Octavian Truică ... Florin Rădulescu
08 Mar 2018
Future Generation Computer Systems | VOL. 85

A Generic Process to Refine a B Specification into a Relational Database Implementation
Régine Laleau ... Amel Mammar
-
Régine Laleau, et. al.Régine Laleau ... Amel Mammar
01 Jan 1999
01 Jan 1999

University Research Graph Database For Efficient Multi-Perspective Data Analysis Using Neo4j
Mohamad Irwan Afandi ... Eka Dyar Wahyuni
-
Mohamad Irwan Afandi, et. al.Mohamad Irwan Afandi ... Eka Dyar Wahyuni
14 Oct 2020
14 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

New Trends in Databases and Information Systems

Abstract

Talk to us

Similar Papers