Retrieving information (Purpose 2)

David Haynes

doi:10.29085/9781783302161.009

Abstract

Overview Metadata standards such as Dublin Core and MODS were designed to improve the retrieval of web resources and discoverability of digital information resources. This chapter considers the role of metadata in information retrieval. It begins with a review of information retrieval concepts and measures of retrieval performance before considering the impact of metadata on retrieval. Reference is made to models for resource description and subject indexing. The final part of the chapter examines the relationship between subject indexing and computational methods of retrieval. The role of metadata in information retrieval Van Rijsbergen (1979, 1–2) makes the distinction between information retrieval and data retrieval. Information retrieval looks at the existence or nonexistence of a document or information resource that matches the search criteria. This lends itself to document (information resource) descriptions, also known as metadata. Data retrieval on the other hand is about obtaining factual answers to a question, although the boundaries are being blurred with the development of fact-based retrieval systems such as Google's Knowledge Graph. This can also be described in terms of metadata, although here the emphasis is on data dictionaries that define the structure of the database rather than describing document content. The focus of this chapter is on information retrieval rather than data retrieval. This ties in with the overall scope of this book on describing document content. However it does deal with documents of all types from mainly text-based through to multimedia materials. Metadata improves the discoverability of information resources by describing the content in a variety of ways. Cochrane (1982) talks about subject access as being systematic, topical or natural (free-text). The systematic approach is via a classification or taxonomy which provides a formal language for describing the content of the resource. The topical approach may be subject headings which may be derived from a controlled language or may be free text. The natural approach uses free text or natural language – i.e. text retrieval from the content of the information resource itself. Considerable effort has gone into text retrieval algorithms, primarily to rank the results in a way that is meaningful and relevant to the searcher.

Full Text