Abstract

Computer programs that access significant amounts of text usually include code that manipulates the textual objects that comprise it. Such programs include electronic mail readers, typesetters and, in particular, full-text information retrieval systems. Such code is often unsatisfying in that access to textual objects is either efficient, or flexible, but not both. A programming language like Awk or Perl provides very general facilities for describing textual objects, but at the cost of rescanning the text for every textual object. At the other extreme, full-text information retrieval systems usually offer access to a very limited number of kinds of textual objects, but this access is very efficient. The system described in this paper is a programming tool for managing textual objects. It provides a great deal of flexibility, giving access to very complex document structure, with a large number of constituent kinds of textual objects. Further, it provides access to these objects very efficiently, both in terms of time and auxiliary space, by being very careful to access secondary storage only when absolutely necessary.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call