Objective:Existing research has demonstrated that neuropsychiatric/behavioral-psychological symptoms of dementia (BPSD) frequently contribute to worse prognosis in patients with neurodegenerative conditions (e.g., increased functional dependence, worse quality of life, greater caregiver burden, faster disease progression). BPSD are most commonly measured via the Neuropsychiatric Inventory (NPI), or its briefer, informant-rated questionnaire (NPI-Q). Despite the NPI-Q’s common use in research and practice, there is disarray in the literature concerning the NPI-Q’s latent structure and reliability, possibly related to differences in methods between studies. Also, hierarchical factor models have not been considered, even though such models are gaining favor in the psychopathology literature. Therefore, we aimed to compare different factor structures from the current literature using confirmatory factor analyses (CFAs) to help determine the best latent model of the NPI-Q.Participants and Methods:This sample included 20,500 individuals (57% female; 80% White, 12% Black, 8% Hispanic), with a mean age of 71 (SD = 10.41) and 15 average years of education (SD = 3.43). Individuals were included if they had completed an NPI-Q during their first visit at one of 33 Alzheimer Disease Research Centers reporting to the National Alzheimer Coordinating Center (NACC). All CFA and reliability analyses were performed with lavaan and semTools R packages, using a diagonally weighted least squares (DWLS) estimator. Eight single-level models using full or modified versions of the NPI-Q were compared, and the top three were later tested in bifactor form.Results:CFAs revealed all factor models of the full NPI-Q demonstrated goodness of fit across multiple indices (SRMR = 0.039-0.052, RMSEA = 0.025-0.029, CFI = 0.973-0.983, TLI = 0.9670.977). Modified forms of the NPI-Q also demonstrated goodness of fit across multiple indices (SRMR = 0.025-0.052, RMSEA = 0.0180.031, CFI = 0.976-0.993, TLI = 0.968-0.989). Top factor models later tested in bifactor form all demonstrated consistently stronger goodness of fit regardless of whether they were a full form (SRMR = 0.023-0.035, RMSEA = 0.015-0.02, CFI = 0.992-0.995, TLI = 0.985-0.991) or a modified form (SRMR = 0.023-0.042, RMSEA = 0.015-0.024, CFI = 0.985-0.995, TLI = 0.9770.992). Siafarikas and colleagues’ (2018) 3-factor model demonstrated the best fit among the full-form models, whereas Sayegh and Knight’s (2014) 4-factor model had the best fit among all single-level models, as well as among the bifactor models.Conclusions:Although all factor models had adequate goodness of fit, the Sayegh & Knight 4-factor model had the strongest fit among both single-level and bifactor models. Furthermore, all bifactor models had consistently stronger fit than single-level models, suggesting that BPSD are best theoretically explained by a hierarchical, non-nested framework of general and specific contributors to symptoms. These findings also inform consistent use of NPI-Q subscales.