NCBO Annotator Research Articles

Introduction/ Background Recently, histopathology has seen the introduction of several tools such as slide scanners and virtual slide technologies, creating the conditions for broader adoption of computer aided diagnosis based on whole slide images (WSI) to reduce observation variability between pathologists. This change brings up a number of new scientific challenges such as the sustainable management of the semantics associated to the grading process, image analysis and annotation in order to facilitate pre-filled report generation. The College of American Pathologists cancer checklists and protocols (CAP-CC&P) [1] are reference resources for complete Anatomic Pathology (AP) reporting of malignant tumors. Current terminology systems for AP structured reporting gather terms of very different granularity [2][3] and have not yet been compiled in a systematic approach. Semantic data models are formal representations of knowledge in a given domain that allow both human users and software applications to consistently and accurately interpret domain terminology [4][5]. Aims Our objective is to i) analyze the histopathological knowledge for breast cancer grading available in the reference CAP CC&P and ii) to build a sustainable formal representation of this knowledge based on existing bio- medical ontologies in NCBO Bioportal [6][7] and UMLS semantic types [8]. Methods Our methodology was first experimented in the context of two cancer grading methods for invasive (Nottinghamsystem) and ductal in situ breast carcinoma. A corpus consisting of 5 texts or “notes” was first selected by an AP expert from the two corresponding CAP CC&Ps. From each note the expert also extracted a list of keyconcepts to be used as a “gold standard”. We used NCBO Annotator [9] for automatic analysis of the corpus. Annotator supports the biomedical community in tagging raw texts automatically with concepts from relevant biomedical ontology and terminology repositories. The methodology used consists in: i) Automatic textual analysis and annotation of the corpus based on the 417 ontologies available on the NCBO platform. We selected a subset of ontologies based on the number of identified concepts and evaluated their relevancy with respect to the gold standard. ii) Semantic modeling of the automatically extracted concepts into a sustainable formal representation based on their UMLS semantic types. Results We identified NCIT, SNOMED-CT, NCI CaDSR Values set, LOINC and PathLex as the ontologies providing the highest number of annotated concepts. Table 1 shows as percentages the coverages of the concepts of each note by the annotations of the 5 reference ontologies. Percentages can add to more than 100 for a single note due to the possible overlap in ontologies coverages. Table 2 uses the same format when only concepts from the gold standards are counted to quantify annotations. From the list of extracted concepts, we made a preliminary formal representation of the histopathological knowledge based on the UMLS semantic types of concepts. Figure 1 shows the so proposed semantic modeling in the context of tubular differentiation. The novelty of this approach is the federation of the knowledge issued from different sources (CAP CC&P, NCBO ontologies and UMLS Metathesaurus) and the sustainable management of the associated semantics. This opens the perspective of building an AP observation ontology that will allow an accurate representation of AP reports understandable by both human and software applications.

Read full abstract

Background Recently, anatomic pathology (AP) has seen the introduction of several tools such as slide scanners and virtual slide technologies, creating the conditions for broader adoption of computer aided diagnosis based on whole slide images (WSI). This change brings up a number of new scientific challenges such as the sustainable management of the explicit and unambiguous semantics associated to the diagnostic interpretation of AP images by both humans (pathologists) and computers (image analysis algorithms) . In order to reduce inter-observer variability between AP reports of malignant tumors, the College of American Pathologists edited more than 60 organ-specific Cancer Checklists and associated Protocols (CAP-CC&P). Each checklist includes a set of AP observations that are expected to be reported by pathologists in organ-specific AP cancer reports. Our objective was to i) identify the available histopathological formalized knowledge from NCBO Bioportal and UMLS metathesaurus in the scope of the CAP CC&P for breast cancer grading and ii) to build a sustainable visual representation of this knowledge using UMLS semantic types. Methods Our methodology was applied on the two breast cancer CAP-CC&Ps dedicated to invasive carcinoma (IC) and ductal carcinoma in situ (DCIS). We focused on a subset of quantifiable AP observations of the CAP-CCs - i.e. observable entities that could be computed by image analysis tools and on the corresponding notes in the protocols that unambiguously describe how pathologists should derive a high-level observation (e.g. Nottingham score) from low-level morphological characteristics observed in images (e.g. mitotic count or glandular/tubular differentiation).The notes were annotated manually by two AP experts (gold standard) and automatically by NCBO Annotator using the 508 ontologies available on the NCBO platform. A sub-set of reference ontologies was selected based on their capacities to automatically identify concepts in the notes and compared to the subset of ontologies selected based on their capacity to identify the concepts identified by experts (gold standard). Once automatically extracted from the notes, the concepts belonging to different ontologies, were integrated into a unique graph and organized according to UMLS semantic types. Results The most relevant biomedical ontologies to be used for the annotation of the notes describing quantifiable observable entities of breast cancer CAP-CC&Ps are SNOMED-CT, LOINC, NCIT, NCI CaDSR Value Sets and PathLex. A visual representation integrating 25 concepts from the 5 different ontologies organized according to 11 UMLS semantic types was built to support AP experts for building a formal representation of the low-level quantifiable entities automatically extracted from the CAP-CC&Ps notes. Conclusion The proposed approach and tools, based on the CAP-CC&Ps, aim at supporting AP experts in building a standard-based representation of low-level morphological abnormalities observed in cancer that can be quantified using image analysis tools. This effort is complementary to the Integrating the Healthcare Enterprise (IHE) initiative building a standard-based representation of high-level AP observations required in cancer AP reports. Additional efforts are needed to achieve a workable standard-based formal representation of histopathological knowledge integrating both observable entities reported by humans (pathologists) and quantifiable entities automatically computed by machines. Providing such unique formal representation paves the way for more efficient use of computer aided diagnosis in AP as well as for the development of new biomarkers based on automatic analysis of whole slide images (WSI).

Read full abstract

NCBO Annotator Research Articles

Related Topics

Articles published on NCBO Annotator

Clinical concept recognition: Evaluation of existing systems on EHRs.

CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources.

NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories.

Converting Biomedical Text Annotated Resources into FAIR Research Objects with an Open Science Platform

Enabling a fast annotation process with the Table2Annotation tool.

Identifying diseases-related metabolites using random walk

Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator.

Sustainable Formal Representation Of Breast Cancer Grading Histopathological Knowledge

A sustainable visual representation of available histopathological digital knowledge for breast cancer grading

Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

NCBO Technology: Powering semantically aware applications

A Framework for Annotating Human Genome in Disease Context

Open semantic annotation of scientific publications using DOMEO

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

NCBO Annotator Research Articles

Related Topics

Articles published on NCBO Annotator

Clinical concept recognition: Evaluation of existing systems on EHRs.

CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources.

NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories.

Converting Biomedical Text Annotated Resources into FAIR Research Objects with an Open Science Platform

Enabling a fast annotation process with the Table2Annotation tool.

Identifying diseases-related metabolites using random walk

Enhanced functionalities for annotating and indexing clinical text with the NCBO Annotator.

Sustainable Formal Representation Of Breast Cancer Grading Histopathological Knowledge

A sustainable visual representation of available histopathological digital knowledge for breast cancer grading

Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

NCBO Technology: Powering semantically aware applications

A Framework for Annotating Human Genome in Disease Context

Open semantic annotation of scientific publications using DOMEO