Abstract

One major challenge in medical imaging analysis is the lack of label and annotation which usually requires medical knowledge and training. This issue is particularly serious in the brain image analysis such as the analysis of retinal vasculature, which directly reflects the vascular condition of Central Nervous System (CNS). In this paper, we present a novel semi-supervised learning algorithm to boost the performance of random forest under limited labeled data by exploiting the local structure of unlabeled data. We identify the key bottleneck of random forest to be the information gain calculation and replace it with a graph-embedded entropy which is more reliable for insufficient labeled data scenario. By properly modifying the training process of standard random forest, our algorithm significantly improves the performance while preserving the virtue of random forest such as low computational burden and robustness over over-fitting. Our method has shown a superior performance on both medical imaging analysis and machine learning benchmarks.

Highlights

  • Machine learning has been widely applied to analyze medical images such as an image of the brain

  • We evaluate the proposed method on both a neuronal image and the retinal image analysis that is highly related to diabetic retinopathy (DR) (Niu et al, 2019) and Alzheimer’s Disease (AD) (Liao et al, 2018), and make the following specific contributions: Semi-Supervised Learning in Medical Images

  • We propose a novel semi-supervised random forest to tackle the challenging problem of the lacking annotation in the analysis of medical imaging such as a brain image

Read more

Summary

INTRODUCTION

Machine learning has been widely applied to analyze medical images such as an image of the brain. Collecting raw data during routine screening is possible but making annotations and diagnoses for them is costly and time-consuming for medical experts To deal with this challenge, we propose a novel graph-embedded semi-supervised algorithm that makes use of the unlabeled data to boost the performance of the random forest. 1. We empirically validate that the performance bottleneck of random forest under limited training samples is the biased information gain calculation. 3. We propose a novel semi-supervised random forest which shows advantage performance of the state-ofthe-art in both medical imaging analysis and machine learning benchmarks. Since a major part of training and the whole testing remains unchanged, our graphembedded random forest could significantly improve the performance without losing the virtue of a standard random forest such as low computational burden and robustness over over-fitting

ANALYSIS OF PERFORMANCE
Performance Bottleneck Under
GRAPH-EMBEDDED REPRESENTATION
CONSTRUCTION OF SEMI-SUPERVISED RANDOM FOREST
EXPERIMENTS
Quantitative Analysis
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.