Multimedia Retrieval and Analysis with Cottontail DB

Ralph Gasser,Silvan Heller,Heiko Schuldt,Luca Rossetto

doi:10.1145/3577934.3577940

Abstract

Analysis and retrieval of media collections get more and more challenging the larger the collections become. Keeping everything in the main memory becomes less feasible, and more and more time and effort have to be spent to deal with the data management. However, traditional relational databases do not support primitives often used in multimedia workloads, such as the nearest-neighbour search on vectors. In this column, we introduce Cottontail DB, an open-source database management system for multimedia features. Cottontail DB supports traditional relational database operations and text retrieval based on Lucene and, most importantly, efficient vector-space retrieval operations for large datasets. Cottontail DB is the new data storage system powering the vitrivr multimedia retrieval stack, which was also previously featured in the SIGMM Records [7]. Just like the other components of vitrivr, Cottontail DB is released under the permissive MIT license. It is written in Kotlin, runs on all major operating systems, and comes with a flexible and easy-to-use gRPC API, which makes it usable in many applications, independent of the programming languages used. Cottontail DB's clean and modular architecture enables the easy extension of its functionalities and also makes it useful in an educational context. In the following, we will give a brief introduction on how Cottontail DB works, what we are using it for, and, most importantly, how it can help you manage your data. To learn more about Cottontail DB, including performance evaluations, we kindly refer readers to our Open Source Software Track Contribution at ACM MM 2020 [1], where Cottontail DB was honored with that year's Best Open Source Award.

Full Text