Retrieval of bibliographic records using Apache Lucene

Branko Milosavljević,Dušan Surla,Danijela Boberić

doi:10.1108/02640471011065355

Abstract

PurposeThe aim of the research is modeling and implementing a software component for the retrieval of bibliographic records using the Apache Lucene retrieval engine.Design/methodology/approachObject‐oriented methodology is used for modeling and implementation of the bibliographic record retrieval engine. Modeling is carried out in the CASE tool that supports the unified modeling language (UML 2.0), while the implementation is using the Java programming language and open source components.FindingsThe result is a software component for the retrieval of bibliographic records that are independent of the bibliographic format used in cataloging. It features great flexibility in terms of configuring search types without the need to change the software implementation.Research limitations/implicationsOne of the constraints of this system relates to the problem of searching linking entry fields. UNIMARC format defines fields used to link the item being cataloged to another bibliographic item, so those fields may contain other fields, which can be termed secondary fields. In this proposed solution, secondary fields are treated as all other fields and there is no information whether the search term belongs to the secondary or a regular field.Practical implicationsThe proposed solution is integrated into library information system BISIS, version 4. This version of the BISIS system is in use at university, public and special libraries. By introducing this version, system performance as well as flexibility of the indexing process are improved and at the same time librarians are able to perform sophisticated and effective retrieval of bibliographic records.Originality/valueThe contribution of this work is in the design of a customizable record retrieval component. It is configured by means of an XML document for specifying mapping rules between subfields of the bibliographic record format and search types. By using XML it is possible to add new mapping rules without additional programming. In addition, great attention has been paid to the indexing of subfields that contain punctuation marks having special semantic meanings for librarians and the transliteration between Cyrillic and Latin scripts. Also, originality of this work lies in using the Apache Lucene search engine, which facilitates building highly flexible and efficient retrieval systems.

Full Text