Associative/parallel processors for searching very large textual data bases

R M Bird,J C Tu,R M Worthy

doi:10.1145/965645.810247

Abstract

This paper describes an approach to solving a major problem in the information processing sciences— that of searching very large (5-50 billion characters) data bases of unstructured free-text for random queries within a reasonable time and at an affordable price. The need by information specialists and knowledge workers for large, fast low-cost text and document retrieval systems is growing rapidly. Conventional approaches to the problem have usually depended upon expensive, general purpose computers, upon special pre-preprocessing of the textual data (e.g. file inverting, indexing, abstracting, etc.), and upon elaborate, costly software. The resulting retrieval systems often cost hundreds of dollars per query and the full scanning of an uninverted, unstructured billion byte textual data base could take hours of computer services. However, in spite of these restrictions, such full text search systems have proved useful and even indispensible for many applications. Computer technology of the late 1960's and the 1970's, in both hardware and software (e.g., minicomputers, low-cost, high density disk storage, “chip” electronics, natural language query systems, etc.), have made i t practical to build special purpose, low-cost text retrieval systems. Such a system has been built, tested, and is now in a production stage. The system called the Associative File Processor (AFP), utilizes a conventional minicomputer (DEC's PDP-11/45) for control, off-the-shelf high density disks for storage, a special purpose parallel search module as a text term detector, and query and retrieval software. The AFP is currently being field tested at two sites. Full text, parallel searches on un-preprocessed textual data bases are being performed at the effective matching rates of 4 billion bytes per second (8K byte key memory times 500 Kbyte/second data stream). Estimated costs are 10 to 25 cents per query for a one billion byte data base. The costs per query and the time for searching increase in a linear fashion as data base increases. A basic architecture for the AFP is described and an implemented version is discussed. A more powerful term detector module is also under development. This system is designed around a finite state automaton algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Associative/parallel processors for searching very large textual data bases

Abstract

Talk to us

Similar Papers

More From: ACM SIGMOD Record

Lead the way for us

Journal: ACM SIGMOD Record	Publication Date: Jan 1, 1977
Citations: 2

Similar Papers

Associative/parallel processors for searching very large textual data bases
R M Bird ... R M Worthy
ACM SIGARCH Computer Architecture News | VOL. 6
R M Bird, et. al.R M Bird ... R M Worthy
01 Jan 1976
ACM SIGARCH Computer Architecture News | VOL. 6

Associative/parallel processors for searching very large textual data bases
R M Bird ... R M Worthy
-
R M Bird, et. al.R M Bird ... R M Worthy
01 Jan 1976
01 Jan 1976

Associative/parallel processors for searching very large textual data bases
R M Bird ... J C Tu
ACM SIGIR Forum | VOL. 12
R M Bird, et. al.R M Bird ... J C Tu
01 Jan 1976
ACM SIGIR Forum | VOL. 12

Using large data bases in nursing and health policy research.
Linda L Lange ... Ada Jacox
Journal of Professional Nursing | VOL. 9
Linda L Lange, et. al.Linda L Lange ... Ada Jacox
01 Jul 1993
Journal of Professional Nursing | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Associative/parallel processors for searching very large textual data bases

Abstract

Talk to us

Similar Papers

More From: ACM SIGMOD Record