2. Image computing for digital pathology

Shishir K Shah ,Edgar Gabriel

doi:10.1109/icpr.2008.4760936

Abstract

Summary form only given. Pathologists and cancer biologists rely on tissue and cellular analysis to study cancer expression, genetic profiles, and cellular morphology to understand the underlying basis for a disease and to grade the level of disease progression. Conventional analysis of tissue histology and sample cytology includes the steps of examination of the stained tissue or cell smear under a microscope, scoring the expression relative to the most highly expressing (densely stained) area on a predefined scale for normal, cancer, stromal regions based on the morphology of the tissue, estimating the percentage area of cancer tissue relative of normal and stroma, and multiplying the score by the percentage area of cancer region and converting to another predefined scale for statistical analyses. Most of this analysis is done manually or with limited tools to aid the scoring process. Over the last 5 years, automated and semi- automated microscope slide scanners have become available in the marketplace. These scanners rely on sophisticated microscopes and allow for the digitization of the entire sample at varying magnifications. This has led to the emergence of digital pathology and a growing amount of image data. Each sample digitized is typically of the order of 2.7 GB to 10 GB in size depending on the magnification of the digitizing system with an image size of 30,000 ? 30,000 pixels or larger. Further, current software and methods for automated scoring of tissue is very limited. This has led to an increased interest in identifying novel solutions to automated histology and cytology analysis. In order to achieve high computational accuracy with reasonable turnaround times, novel approaches from the data and resource management perspective are also required to address handling of image sizes outlined above. Two developments in computer industry make the current generation of scientists more likely to solve the performance challenges associated with the large image sizes. First, the emergence of multi-core processors allow for parallel processing within a single PC using of-the-shelf components. It is virtually impossible today to buy a PC that does not have at least two computational cores. Furthermore, all major manufacturers have announced processors containing four, eight and even sixteen cores for the next two years, providing an omnipresent potential for parallel processing on every desktop PC. Second, as of today, there is no reasonable size medical or research institution in the US not having a PC cluster in their computing arsenal. The next generation of digital pathology systems will require the ability to share and process data across disparate institutes. Hence, PC clusters would allow for parallelism by analyzing multiple images simultaneously. They would also offer an opportunity to speed up the analysis of a single image. However, exploiting the computational power of multi-core architectures and PC clusters requires modifications to existing, sequential image analysis codes and cautious evaluation of alternative and novel algorithms with respect to their potential for parallelism. This tutorial will provide an overview of the application domain and present an overview of the challenges. Specifically, opportunities for novel image analysis and pattern recognition algorithms that can leverage frameworks for shared and distributed parallel computing will be discussed along with examples from ongoing research in the labs of the instructors. Topics to be covered will include: 1) Overview of digital pathology 2) Applications and Challenges (Immunohistochemistry, H&E analysis, FISH, Alternate Image Modalities - Spectral Imaging, Histology, Cytology) 3) Architectural Developments (Multi-core, Networking, GPU processing, Storage) 4) Image Analysis Pipelines and Algorithms (Image Segmentation, Nuclei Detection, Morphometric Features, Karyometric Features, ...) 5) Performance implications (data management, exploiting multi-core processors, exploiting PC cluster) 6) Emerging trends and applications.

Full Text