Despite advances in deep learning, robust medical image segmentation in the presence of artifacts, pathology, and other imaging shortcomings has remained a challenge. In this paper, we demonstrate that by synergistically marrying the unmatched strengths of high-level human knowledge (i.e., natural intelligence (NI)) with the capabilities of deep learning (DL) networks (i.e., artificial intelligence (AI)) in garnering intricate details, these challenges can be significantly overcome. Focusing on the object recognition task, we formulate an anatomy-guided deep learning object recognition approach named AAR-DL which combines an advanced anatomy-modeling strategy, model-based non-deep-learning object recognition, and deep learning object detection networks to achieve expert human-like performance. The AAR-DL approach consists of 4 key modules wherein prior knowledge (NI) is made use of judiciously at every stage. In the first module AAR-R, objects are recognized based on a previously created fuzzy anatomy model of the body region with all its organs following the automatic anatomy recognition (AAR) approach wherein high-level human anatomic knowledge is precisely codified. This module is purely model-based with no DL involvement. Although the AAR-R operation lacks accuracy, it is robust to artifacts and deviations (much like NI), and provides the much-needed anatomic guidance in the form of rough regions-of-interest (ROIs) for the following DL modules. The 2nd module DL-R makes use of the ROI information to limit the search region to just where each object is most likely to reside and performs DL-based detection of the 2D bounding boxes (BBs) in slices. The 2D BBs hug the shape of the 3D object much better than 3D BBs and their detection is feasible only due to anatomy guidance from AAR-R. In the 3rd module, the AAR model is deformed via the found 2D BBs providing refined model information which now embodies both NI and AI decisions. The refined AAR model more actively guides the 4th refined DL-R module to perform final object detection via DL. Anatomy knowledge is made use of in designing the DL networks wherein spatially sparse objects and non-sparse objects are handled differently to provide the required level of attention for each. Utilizing 150 thoracic and 225 head and neck (H&N) computed tomography (CT) data sets of cancer patients undergoing routine radiation therapy planning, the recognition performance of the AAR-DL approach is evaluated on 10 thoracic and 16 H&N organs in comparison to pure model-based approach (AAR-R) and pure DL approach without anatomy guidance. Recognition accuracy is assessed via location error/ centroid distance error, scale or size error, and wall distance error. The results demonstrate how the errors are gradually and systematically reduced from the 1st module to the 4th module as high-level knowledge is infused via NI at various stages into the processing pipeline. This improvement is especially dramatic for sparse and artifact-prone challenging objects, achieving a location error over all objects of 4.4mm and 4.3mm for the two body regions, respectively. The pure DL approach failed on several very challenging sparse objects while AAR-DL achieved accurate recognition, almost matching human performance, showing the importance of anatomy guidance for robust operation. Anatomy guidance also reduces the time required for training DL networks considerably. (i) High-level anatomy guidance improves recognition performance of DL methods. (ii) This improvement is especially noteworthy for spatially sparse, low-contrast, inconspicuous, and artifact-prone objects. (iii) Once anatomy guidance is provided, 3D objects can be detected much more accurately via 2D BBs than 3D BBs and the 2D BBs represent object containment with much more specificity. (iv) Anatomy guidance brings stability and robustness to DL approaches for object localization. (v) The training time can be greatly reduced by making use of anatomy guidance.
Read full abstract