Finding the Meaning in Images:Annotation and Image Markup Daniel L. Rubin (bio) Keywords ontologies, semantic annotation, imaging, knowledge representation Biomedical images and ontologies are closely related conceptually, yet currently they are studied in isolation. Biomedical ontologies provide a representation of the canonical entities considered in biomedical research and clinical observations, and the relations among them. Images reveal instances of those entities and, taken in aggregate, inform the construction of ontologies describing the pertinent domain content revealed in the images. The article by Fielding and Marwede (2011) notes the differences between the ontology of the body and the ontology of the image, developing toward an application of ontology of the psychiatric domain. Although such ontology development is important for knowledge representation, it is also important to relate and integrate such ontologies with the actual images to which they relate. In this commentary, we describe ongoing work to accomplish this linkage. Connecting biomedical ontologies to images is an important activity. Biomedical images provide rich information, but the contents of images, such as the modality used to acquire them, the anatomy they contain, and visual observations made about images, are not explicit or computable. Image data are accumulating in a variety of online databases at an explosive pace, similar to nonimage data. But whereas nonimage data, such as genetic data, are easily processed by machines, image data are generally not exploited directly—images typically are stored in archives, and only particular data needed for the study in which the images were originally acquired are generally available for subsequent analysis. Consequently, informatics methods are in development to enable the community to leverage the vast amounts of images accumulating as products of biomedical research. The Challenges of Using Images in e-Science There is growing interest in applying semantic web technologies to biomedicine, because these methods can make biomedical data explicit and computable. An "e-Science" paradigm is emerging, and the biomedical community is looking for tools to help them access, query, and analyze the myriad of data available online. Specifically, they are beginning to embrace technologies for semantic scientific knowledge integration, such as ontologies (Bodenreider and Stevens 2006), standard syntaxes and semantics to make biomedical [End Page 311] knowledge explicit, and the Semantic Web (Ruttenberg et al. 2007). These technologies are enabling the community to access large amounts of data, and to interoperate among diverse data archives. Such technologies are showing promise in tackling the information challenges in biomedicine, and a variety of applications are quickly appearing (Ruttenberg et al. 2007). Although researchers can now access a broad diversity of biomedical data, a critical type of data—images—remains difficult to leverage. Those wanting to access and use imaging in their work face similar difficulties as the rest of the e-Science community, namely to manage, find, and use the voluminous amounts of imaging data accruing at an explosive pace. However, imaging poses unique challenges hindering direct translation of the informatics methods that are currently being applied to nonimaging biomedical data. Image Content Is Not Explicit and Machine Accessible Images contain rich information about anatomy and abnormal structures contained in the images; however, this is implicit knowledge that is deduced by the person viewing the image. For example, a researcher viewing an image may want to indicate where in the image particular areas of interest lie, and whether they are abnormal (Figure 1). This information, the semantic image content, is often considered "image metadata," including observations about images, interpretations, and conclusions, and it is generally not recorded in a structured manner nor directly linked to the image. Thus, images cannot be easily searched for their semantic content (e.g., find all images containing particular anatomy or representing particular abnormalities). No Controlled Image Terminology or Standard Syntax for Image Information There are no standard terminologies specifically for describing medical image contents—the imaging observations, the anatomy, and the pathology—and the syntax in which the information is recorded varies, with no widely adopted standards, resulting in limited interoperability. Descriptions of medical images are most frequently recorded in free text in an unstructured manner, limiting the ability of computers to analyze and access this information. Schemes for annotating images have been proposed in nonmedical domains...
Read full abstract