Abstract

BackgroundAs biomedical knowledge is rapidly evolving, concept enrichment of biomedical terminologies is an active research area involving automatic identification of missing or new concepts. Previously, we prototyped a lexical-based formal concept analysis (FCA) approach in which concepts were derived by intersecting bags of words, to identify potentially missing concepts in the National Cancer Institute (NCI) Thesaurus. However, this prototype did not handle concept naming and positioning. In this paper, we introduce a sequenced-based FCA approach to identify potentially missing concepts, supporting concept naming and positioning.MethodsWe consider the concept name sequences as FCA attributes to construct the formal context. The concept-forming process is performed by computing the longest common substrings of concept name sequences. After new concepts are formalized, we further predict their potential positions in the original hierarchy by identifying their supertypes and subtypes from original concepts. Automated validation via external terminologies in the Unified Medical Language System (UMLS) and biomedical literature in PubMed is performed to evaluate the effectiveness of our approach.ResultsWe applied our sequenced-based FCA approach to all the sub-hierarchies under Disease or Disorder in the NCI Thesaurus (19.08d version) and five sub-hierarchies under Clinical Finding and Procedure in the SNOMED CT (US Edition, March 2020 release). In total, 1397 potentially missing concepts were identified in the NCI Thesaurus and 7223 in the SNOMED CT. For NCI Thesaurus, 85 potentially missing concepts were found in external terminologies and 315 of the remaining 1312 appeared in biomedical literature. For SNOMED CT, 576 were found in external terminologies and 1159 out of the remaining 6647 were found in biomedical literature.ConclusionOur sequence-based FCA approach has shown the promise for identifying potentially missing concepts in biomedical terminologies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call