Abstract

The Gene hierarchy of the National Cancer Institute (NCI) Thesaurus (NCIt) is of high priority for NCI. It is important to have quality assurance (QA) techniques to improve its content quality. We present a two-step methodology concentrating on auditing the modeling of complex concepts, which are shown to have a higher error rate compared to control concepts. In the first step, we test whether concepts that appear complex in a so called “partial-area taxonomy” have a higher error rate than control concepts. In the second step, we introduce an innovative technique based on a “partial-area sub-taxonomy” (constructed with a subset of roles) to discover additional complex concepts. The results of the QA study show that these concepts are indeed statistically significantly more likely to have more errors than control concepts. This makes it easier for NCI staff to improve the modeling quality of gene concepts in NCIt.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.