Information retrieval on the connection machine: 1 to 8192 gigabytes

Craig Stanfill,Robert Thau

doi:10.1016/0306-4573(91)90085-z

Abstract

This paper describes algorithms and data structures for applying a parallel computer to information retrieval. Previous work has described an implementation based on overlap encoded signatures. That system was limited by (a) the necessity of keeping the signatures in primary memory and (b) the difficulties involved in implementing document-term weighting. Overcoming these limitations required adapting the inverted index techniques used on serial machines. The most obvious adaptation, also previously described, suffers from the fact that data must be sent between processors at query time. Since interprocessor communication is generally slower than local computation, this suggests that an algorithm which does not perform such communication might be faster. This paper presents a data structure, called a partitioned posting file, in which the interprocessor communication takes place at database-construction time, so that no data movement is needed at query-time. Performance characteristics and storage overhead are established by benchmarking against a synthetic database. Based on these figures, it appears that currently available hardware can deliver interactive document ranking on databases containing between 1 and 8192 Gigabytes of text.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Information retrieval on the connection machine: 1 to 8192 gigabytes

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management

Lead the way for us

Journal: Information Processing and Management	Publication Date: Jan 1, 1991
Citations: 49

Similar Papers

Partitioned posting files: a parallel inverted file structure for information retrieval
C Stanfill
-
C StanfillC Stanfill
01 Dec 1989
01 Dec 1989

Linear-Space Approximate Distance Oracles for Planar, Bounded-Genus and Minor-Free Graphs
Ken-Ichi Kawarabayashi ... Christian Sommer
-
Ken-Ichi Kawarabayashi, et. al.Ken-Ichi Kawarabayashi ... Christian Sommer
01 Jan 2010
01 Jan 2010

Modelling of Communication in Unified Parallel and Distributed Computing Environment
Peter Hanuliak ... Michal Hanuliak
International Journal on Communications Antenna and Propagation (IRECAP) | VOL. 8
Peter Hanuliak, et. al.Peter Hanuliak ... Michal Hanuliak
30 Apr 2018
International Journal on Communications Antenna and Propagation (IRECAP) | VOL. 8

Data structures for categorical path counting queries
Meng He ... Serikzhan Kazi
Theoretical Computer Science | VOL. 938
Meng He, et. al.Meng He ... Serikzhan Kazi
13 Oct 2022
Theoretical Computer Science | VOL. 938

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Information retrieval on the connection machine: 1 to 8192 gigabytes

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management