Purpose The Confederation of Open Access Repositories (COAR) prescribes three types of controlled vocabularies for open access repositories (OARs): access rights, resource type and version type. Interestingly, COAR does not suggest a subject-specific vocabulary for organising content, whereas the subject parameter is one of the most preferred search categories used by information seekers. The purpose of the study is to investigate the use of controlled vocabularies in subject arrangement in OARs. Design/methodology/approach The study comprises eight stages. The first is to identify the total number of repositories enlisted in OpenDOAR under the social science domain in India. Next, the samples are selected, followed by data harvesting using MarcEdit OAI-PMH harvester plug-ins. The next step is to process the data and then consolidated controlled vocabulary is constructed by merging six existing thesauri. Then, a similarity matching algorithm is used to determine the usage of controlled vocabularies for subject arrangement. The last step is to evaluate the efficacy of controlled vocabulary in information retrieval. Findings The results revealed that subject arrangement differs largely in each repository. The study also showed that the use of controlled vocabularies in OARs for subject arrangement still needs to be standardised to enhance interoperability. Originality/value This research reflected that those controlled terms address issues, such as ambiguity, inconsistency and synonym variation, typically found with uncontrolled keywords through standardising subject metadata. This standardisation gives users a more reliable and user-friendly search experience, ultimately improving the discoverability and usability of open access content.
Read full abstract