The integration of information retrieval techniques within a software reuse environment

F Gibb,C Mccartan,N Sweeney,R Leon,R O'Donnell

doi:10.1177/0165551004233221

Abstract

This paper describes the development of an information retrieval (IR) model for the indexing, storage and retrieval of documents created in extensible mark-up language (XML). The application area is the software reuse environment, which involves a broader class of documents than can be processed by conventional IR systems. This includes design and analysis documents in unified modelling language (UML) notation, as well as textual format, source code and textual and source code component interface definitions. XML was selected because it is emerging as the key standard for the representation of structured documents on the World Wide Web (WWW) and incorporates methods for the representation of metadata. A model is described that is easily customisable, since it is based upon an extensible object-oriented framework. This allows the development of an IR architecture that can easily be adapted to cope with the proliferation of XML document type definitions (DTDs) that is likely to be a characteristic of the WWW in the near future.

Full Text