Abstract
One major challenge in medical imaging analysis is the lack of label and annotation which usually requires medical knowledge and training. This issue is particularly serious in the brain image analysis such as the analysis of retinal vasculature, which directly reflects the vascular condition of Central Nervous System (CNS). In this paper, we present a novel semi-supervised learning algorithm to boost the performance of random forest under limited labeled data by exploiting the local structure of unlabeled data. We identify the key bottleneck of random forest to be the information gain calculation and replace it with a graph-embedded entropy which is more reliable for insufficient labeled data scenario. By properly modifying the training process of standard random forest, our algorithm significantly improves the performance while preserving the virtue of random forest such as low computational burden and robustness over over-fitting. Our method has shown a superior performance on both medical imaging analysis and machine learning benchmarks.
Highlights
Machine learning has been widely applied to analyze medical images such as an image of the brain
We evaluate the proposed method on both a neuronal image and the retinal image analysis that is highly related to diabetic retinopathy (DR) (Niu et al, 2019) and Alzheimer’s Disease (AD) (Liao et al, 2018), and make the following specific contributions: Semi-Supervised Learning in Medical Images
We propose a novel semi-supervised random forest to tackle the challenging problem of the lacking annotation in the analysis of medical imaging such as a brain image
Summary
Machine learning has been widely applied to analyze medical images such as an image of the brain. Collecting raw data during routine screening is possible but making annotations and diagnoses for them is costly and time-consuming for medical experts To deal with this challenge, we propose a novel graph-embedded semi-supervised algorithm that makes use of the unlabeled data to boost the performance of the random forest. 1. We empirically validate that the performance bottleneck of random forest under limited training samples is the biased information gain calculation. 3. We propose a novel semi-supervised random forest which shows advantage performance of the state-ofthe-art in both medical imaging analysis and machine learning benchmarks. Since a major part of training and the whole testing remains unchanged, our graphembedded random forest could significantly improve the performance without losing the virtue of a standard random forest such as low computational burden and robustness over over-fitting
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.