In-RDBMS inverted indexes revisited

Ian Rae,Alan Halverson,Jeffrey F Naughton

doi:10.1109/icde.2014.6816664

Abstract

Every major open-source and commercial RDBMS offers some form of support for full-text search using inverted indexes. When providing this support, some developers have implemented specialized indexes that adapt techniques from the Information Retrieval (IR) community to work in a database setting, while others have opted to rely on the standard relational query engine to process inverted index lookups. This choice is an important one, since the storage formats and algorithms used can vary greatly between a specialized index and a relational index, but these alternatives have not been thoroughly compared in the same system. Our work explores the differences in implementation and performance of three representative environments for an in-RDBMS inverted index: an in-RDBMS IR engine, a row-oriented relational query engine, and a column-oriented relational query engine. We found that a specialized IR engine integrated into the RDBMS can provide more than an order of magnitude speedup over both the row- and column-oriented relational query engines for conjunctive and phrase queries. For warm queries, this advantage is largely algorithmic, and we show that by using ZigZag merge join to accelerate conjunctive and phrase query processing, relational inverted indexes can provide performance comparable to a specialized in-RDBMS IR engine with no change to the underlying storage format. Compression and index format, in contrast, have more impact on cold queries, where the IR and column-oriented engines are able to outperform the row-oriented engine, even with ZigZag merge join.

Full Text