Optimal High-Performance Parallel Text Retrieval via Fat-Trees

B Mamalis

doi:10.1007/s002240000133

Abstract

We present here a high-performance parallel free-text retrieval method for multiple text queries using the Vector Space Model. Our method employs the fat-tree area universal routing network for connecting the processors of a parallel machine, however (in its general form) it could also be efficiently applied over any other high-bandwidth network of processors or workstations. We provide a theoretical analysis of our technique which shows it is excessively efficient and clearly superior (concerning both the amortized processing times and the average waiting times per query) to parallel text retrieval methods for single queries (e.g., based on binary trees). Moreover, we prove our method to be optimal with respect to the execution of all the implied communication tasks on ideal fat-trees. We also experimentally demonstrate the high performance and superiority of our technique via suitable embeddings of ideal fat-trees on realistic two-dimensional mesh-oriented parallel machines (e.g., the GCel Parsytec machine) and via the use of the large-scale TREC document collections. Note that the fat-tree can simulate any other network of the same hardware with only a polylogarithmic loss of efficiency.

Full Text