Abstract

Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.

Highlights

  • For a concrete example, as one of the most common neurodegenerative diseases found in elderly, Alzheimer’s disease (AD) accounts for up to 70% of dementia cases[3]

  • Besides structural magnetic resonance imaging (MRI), other imaging modalities such as functional MRI can be used in AD/mild cognitive impairment (MCI) diagnosis[19,20,21,22,23], as they provide additional functional information about hypometabolism and specific protein quantification, which can be beneficial in disease diagnosis

  • For the unlabeled data in our method, we choose the irrelevant subjects with respect to the current classification task, e.g., when we classify AD and normal control (NC), the data from MCI subjects are used as unlabeled data

Read more

Summary

Introduction

As one of the most common neurodegenerative diseases found in elderly, Alzheimer’s disease (AD) accounts for up to 70% of dementia cases[3]. Computer-aided diagnoses, including those for AD/MCI, often encounter a challenge that the data dimensionality is usually much higher than the number of available samples for model training[30]. This imbalance between feature number and sample size may affect the learning of a classification model for disease prediction, or a regression model for clinical score prediction. In MRI-based diagnosis, features are usually generated by segmenting a brain into different regions-of-interest (ROIs)[29]. As some of the ROIs may be irrelevant to AD/MCI, feature selection can be conducted to identify the most relevant brain regions in order to learn the classification model more effectively. It is preferable to use only the most discriminative features from both MRI and SNPs for classification model training

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call