Abstract

While the pathologist population tends to dramatically drop, the number of pathological cases to be examined increases sharply, mainly due to early screening campaigns; developing automated systems would thus be useful to help pathologists in their daily work. As Virtual Microscopy (VM) is more and more introduced in pathology departments [1] where it holds immense potential despite the large amounts of data to be managed, its combination with image processing techniques can allow to find objective criteria for differential diagnosis or to quantify prognostic markers. Thus, many works try to develop computer-aided diagnosis systems (CADS) based on image retrieval and classification [2,3]. The first step consists in building a knowledge database involving many features extracted from a set of well-known images; it is an 'off-line' procedure conducted once. These features are represented by vectors of non-linear data acting as a signature for the original images. In a second step, signatures are obtained from new unknown images to analyze and compared with the database; it is an 'on-line' procedure. Because of tumor heterogeneity, it is essential to build knowledge databases containing representative features of the multiple morphological types of lesions before considering to implement a CADS. But, as it is almost impossible for a pathologist to manually segment large virtual slide images (VSI), the usual practice consists in manually selecting some 'representative areas'. A bias is then introduced in the process as this choice is obviously subjective. It is then mandatory to find wiser solutions leading to an unbiased collection of these 'representative areas' (and later called 'patches'). In a previous work [4], we have proposed an original strategy: starting from a collection of breast cancer VSI, then taking advantage of stereological sampling methods and diffusion maps, a knowledge database is obtained from a reduced number of patches that are representative of given histological types. The sampling tools offered by stereology are well-suited in this context [5]. Systematic sampling starting from a random point with a fixed periodic interval is able to reduce the area to be analyzed, while preserving the collection of distinctive regions encountered in a tumor. However, even if the working area becomes smaller, the number of selected patches can be very large and may include many redundant elements. A data reduction has then to be conducted. Among the available methods, the diffusion maps technique [6,7] has been retained since it provides a very attractive framework for processing and visualizing huge non-linear bulk data. Diffusion maps belongs to unsupervised learning algorithms dealing with a spectral analysis of non-linear data, providing a clustering only for given training points with no straightforward extension for out-of-sample cases. The work presented here focuses on a way to get around this problem and explains how unknown VSI can be classified by considering the diffusion maps as a learning eigenfunction of a data-dependent kernel. It makes use of the Nystrom formula to estimate diffusion coordinates of new data [8]. An application on histological types of breast cancer is presented with VSI of Invasive Ductal Carcinoma and Mastosis.

Highlights

  • While the pathologist population tends to dramatically drop, the number of pathological cases to be examined increases sharply, mainly due to early screening campaigns; developing automated systems would be useful to help pathologists in their daily work

  • Table 1 shows that features extraction is O(n) while the spectral analysis is close to O(n3); it has to be noticed that the latter involves both eigenvectors decomposition and code for managing the computer-aided diagnosis systems (CADS)

  • This work is the second part of a CADS we aim to develop based on an original strategy starting from VS and leading to an unbiased knowledge database containing reference patches of breast tumors

Read more

Summary

Introduction

While the pathologist population tends to dramatically drop, the number of pathological cases to be examined increases sharply, mainly due to early screening campaigns; developing automated systems would be useful to help pathologists in their daily work. The first step consists in building a knowledge database involving many features extracted from a set of well-known images; it is an ‘off-line’ procedure conducted once These features are represented by vectors of non-linear data acting as a signature for the original images. It is essential to build knowledge databases containing representative features of the multiple morphological types of lesions before considering to implement a CADS. As it is almost impossible for a pathologist to manually segment large virtual slide images (VSI), the usual practice consists in manually selecting some ‘representative areas’. In a previous work [4], we have

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call