Abstract

Abstract : The bottleneck in many applications that process massive amounts of data is the I/O communications between internal memory and external memory. The bottleneck is accentuated as processors get faster and parallel processors are used. Parallel disk arrays are often used to increase the I/O bandwidth. The goal of this proposal is to deepen our understanding of the limits of I/O systems and to construct external memory algorithms that are provably efficient. The three measures of performance are number of I/Os, disk storage space, and CPU time. Even when the data fit entirely in memory, communication can still be the bottleneck, and the related issues of caching become important. Theoretical work involves development and analysis of provably efficient external memory algorithms and cache-efficient algorithms for a variety of important application areas. We address several batched and on-line problems, involving text databases, prefetching and streaming data from parallel disks, and database selectivity estimation. Our experimental validation uses our TPIE programming environment. Plans for the coming year are to address bottleneck issues in parallel disks, text databases, and XML databases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call