Early classification and prediction of Alzheimer disease (AD) and amnestic mild cognitive impairment (aMCI) with noninvasive approaches is a long-standing challenge. This challenge is further exacerbated by the sparsity of data needed for modeling. Deep learning methods offer a novel method to help address these challenging multiclass classification and prediction problems. We analyzed 3 target feature-sets from the National Alzheimer Coordinating Center (NACC) dataset: (1) neuropsychological (cognitive) data; (2) patient health history data; and (3) the combination of both sets. We used a masked Transformer-encoder without further feature selection to classify the samples on cognitive status (no cognitive impairment, aMCI, AD)-dynamically ignoring unavailable features. We then fine-tuned the model to predict the participants' future diagnosis in 1 to 3 years. We analyzed the sensitivity of the model to input features via Feature Permutation Importance. We demonstrated (1) the masked Transformer-encoder was able to perform prediction with sparse input data; (2) high multiclass current cognitive status classification accuracy (87% control, 79% aMCI, 89% AD); (3) acceptable results for 1- to 3-year multiclass future cognitive status prediction (83% control, 77% aMCI, 91% AD). The flexibility of our methods in handling inconsistent data provides a new venue for the analysis of cognitive status data.