Abstract

To support clinical researchers, librarians and informationists may need search filters for particular tasks. Development of filters typically depends on a "gold standard" dataset. This paper describes generalizable methods for creating a gold standard to support future filter development and evaluation using oral squamous cell carcinoma (OSCC) as a case study. OSCC is the most common malignancy affecting the oral cavity. Investigation of biomarkers with potential prognostic utility is an active area of research in OSCC. The methods discussed here should be useful for designing quality search filters in similar domains. The authors searched MEDLINE for prognostic studies of OSCC, developed annotation guidelines for screeners, ran three calibration trials before annotating the remaining body of citations, and measured inter-annotator agreement (IAA). We retrieved 1,818 citations. After calibration, we screened the remaining citations (n = 1,767; 97.2%); IAA was substantial (kappa = 0.76). The dataset has 497 (27.3%) citations representing OSCC studies of potential prognostic biomarkers. The gold standard dataset is likely to be high quality and useful for future development and evaluation of filters for OSCC studies of potential prognostic biomarkers. The methodology we used is generalizable to other domains requiring a reference standard to evaluate the performance of search filters. A gold standard is essential because the labels regarding relevance enable computation of diagnostic metrics, such as sensitivity and specificity. Librarians and informationists with data analysis skills could contribute to developing gold standard datasets and subsequent filters tuned for their patrons' domains of interest.

Highlights

  • The biomedical literature is ever growing and, poses a serious challenge to researchers and clinicians who need to find relevant literature

  • The retrieval set is about prognostic studies of oral squamous cell carcinoma (OSCC) with a subset of citations for studies about potential prognostic biomarkers

  • The implication is that the OSCC gold standard dataset is likely to be of high quality and will be a useful reference standard for subsequent filter development

Read more

Summary

Introduction

The biomedical literature is ever growing and, poses a serious challenge to researchers and clinicians who need to find relevant literature. More than 2 million biomedical articles were published in North America from 2000–2009, a 42% increase when compared to the previous decade [3]. Finding what one needs in a large database can be a daunting task, especially for the naıve user [4]. The naıve user will generally enter keywords to retrieve a list of citations, many of which may be relevant. The length of the list could be in the thousands. At this point, the user might try other combinations of keywords to increase precision.

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call