AbstractBackgroundFluorodeoxyglucose positron emission tomography (FDG‐PET) is an established modality for differential diagnosis of dementia. Deriving phenotypic signatures from FDG‐PET via machine learning is challenging due to the high dimensional nature of brain images relative to the generally small number of labeled examples available for training, the class imbalance among those labels, and the cooccurrence of multiple pathologies. In this study, we developed a multi‐class, multi‐label framework to address these challenges.MethodA database of clinically acquired PET/CT images from 3,000 unique patients was used to develop a latent space model using matrix decomposition. This model was then applied to images from a separate cohort of Mayo Clinic Alzheimer’s Disease Research Center participants (n=1,745) labeled as cognitively unimpaired (CU) (n=1,436) or with the following potentially co‐occurring phenotypes: Alzheimer’s disease (AD) (n=165), Lewy body dementia (DLB) (n=92), behavioral variant frontotemporal dementia (bvFTD) (n=43), semantic (svPPA) (n=10) and logopenic (lvPPA) (n=13) variant PPA, and posterior cortical atrophy (PCA) (n=17). A k‐nearest neighbors classifier that is robust to these imbalanced and overlapping labels was then trained on these examples. The resulting classifier was evaluated by area under receiver‐operator characteristic curve (ROC‐AUC) via leave one out cross validation, using clinical diagnosis as the gold standard.ResultROC curves and AUC scores for each phenotype are illustrated in Fig. 1a. Because the classifier is based on a k‐nearest neighbors connectivity matrix, it has a convenient graphical representation, where images are nodes and edges are drawn between an image and its set of nearest neighbors in latent space. A self‐organizing force directed graph constructed in this way is illustrated in Fig 1b, highlighting the strong separation of CU and degenerative images, as well as the segregation of each phenotype within the neurodegenerative region of the graph.ConclusionIn this study, we developed a machine learning framework for classification of neurodegenerative disease based on k‐nearest neighbor analysis in a low dimensional latent space projection of FDG‐PET images. By leveraging low‐dimensional representations and k‐nearest neighbors analysis, this framework is robust in multi‐class, multi‐label tasks with strong class imbalance and provides a highly interpretable graphical representation.