Abstract

CD-ROM is an attractive delivery velucle for full-text databases. Because of large storage capacity and low access speed, carefully designed indexing structures—including a con cordance—are necessary to enable the text to be retneved efficiently. However, the indexes are sufficiently large that they tax the ability of main store to hold them when processing quenes. The use of compression techniques can substantially mcrease the volume of text that a disk can accommodate, and substantially decrease the amount of pnmary storage needed to hold the indexes. This paper describes a suitable indexing mechanism, and its compression potential using modem compression methods. It is possible to double the amount of text that can be stored on a CD-ROM disk and include a full concordance and indexes as well. A single disk can accommodate around 180 million words of text—equivalent to a library of 1000-1500 books—and provide rapid response to a vanety of quenes involving multi ple search terms and word fragments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call