Abstract
PurposeThe purpose of this paper is to investigate the search behavior of institutional repository (IR) users in regard to subjects as a means of estimating the potential impact of applying a controlled subject vocabulary to an IR.Design/methodology/approachGoogle Analytics data were used to record cases where users arrived at an IR item page from an external web search and subsequently downloaded content. Search queries were compared against the Faceted Application of Subject Terminology (FAST) schema to determine the topical nature of the queries. Queries were also compared against the item’s metadata values for title and subject using approximate string matching to determine the alignment of the queries with current metadata values.FindingsA substantial portion of successful user search queries to an IR appear to be topical in nature. User search queries matched values from FAST at a higher rate than existing subject metadata. Increased attention to subject description in IR records may provide an opportunity to improve the search visibility of the content.Research limitations/implicationsThe study is limited to a particular IR. Data from Google Analytics does not provide comprehensive search query data.Originality/valueThe study presents a novel method for analyzing user search behavior to assist IR managers in determining whether to invest in applying controlled subject vocabularies to IR content.
Highlights
The application of controlled subject vocabularies is a means of improving resource discovery
The other 85% of BitStream click events occurred in cases other than a user landing directly on an item page from a search, or where Google Analytics did not record a search Keyword
Of the random sample of 300 queries that were further analyzed, 97 queries (32%) were manually split into discrete topics and the queries were again reconciled against the Faceted Application of Subject Terminology (FAST) vocabulary
Summary
The application of controlled subject vocabularies is a means of improving resource discovery. Applying controlled subject vocabularies to IR records can incur significant costs. This is especially true in cases where the controlled vocabulary is to be applied retroactively to repository content that has been submitted by a variety of users. Compounding the problem, IRs often include a wide range of content, from articles and gray literature to institutional records, in a wide range of disciplines. Such scenarios can result in a great diversity of subject and keyword terms applied unevenly across content and over a significant period of time. After-the-fact metadata remediation and enhancement potentially requires a great deal of effort
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.