BackgroundThe Ruff Figural Fluency Test (RFFT) is a valid but time-consuming and labour-intensive cognitive paper-and-pencil test. A digital RFFT was developed that can be conducted independently using an iPad and Apple Pencil and RFFT scores are computed automatically. We investigated the validity and reliability of this digital RFFT.MethodsWe randomly allocated participants to the digital or paper-and-pencil RFFT. After the first test, the other test was performed immediately (cross-over). Participants were invited for a second digital RFFT 1 week later. For the digital RFFT, an (automatic) algorithm and two independent raters (criterion standard) assessed the number of unique designs (UD) and perseverative errors (PE). These raters also assessed the paper-and-pencil RFFT. We used Intraclass correlation coefficients (ICC), sensitivity, specificity, %-agreement, Kappa, and Bland–Altman plots.ResultsWe included 94 participants (mean (SD) age 39.9 (14.8), 73.4% follow-up). Mean (SD) UD and median (IQR) PE of the digital RFFT were 84.2 (26.0) and 4 (2–7.3), respectively. Agreement between manual and automatic scoring of the digital RFFT was high for UD (ICC = 0.99, 95% CI 0.98, 0.99, sensitivity = 0.98; specificity = 0.96) and PE (ICC = 0.99, 95% CI 0.98, 0.99; sensitivity = 0.90, specificity = 1.00), indicating excellent criterion validity. Small but significant differences in UD were found between the automatic and manual scoring (mean difference: − 1.12, 95% CI − 1.92, − 0.33). Digital and paper-and-pencil RFFT had moderate agreement for UD (ICC = 0.73, 95% CI 0.34, 0.87) and poor agreement for PE (ICC = 0.47, 95% CI 0.30, 0.62). Participants had fewer UD on the digital than paper-and-pencil RFFT (mean difference: − 7.09, 95% CI − 11.80, − 2.38). The number of UD on the digital RFFT was associated with higher education (Spearman’s r = 0.43, p < 0.001), and younger age (Pearson’s r = − 0.36, p < 0.001), showing its ability to discriminate between different age categories and levels of education. Test–retest reliability was moderate (ICC = 0.74, 95% CI 0.61, 0.83).ConclusionsThe automatic scoring of the digital RFFT has good criterion and convergent validity. There was low agreement between the digital RFFT and paper-and-pencil RFFT and moderate test–retest reliability, which can be explained by learning effects. The digital RFFT is a valid and reliable instrument to measure executive cognitive function among the general population and is a feasible alternative to the paper-and-pencil RFFT in large-scale studies. However, its scores cannot be used interchangeably with the paper-and-pencil RFFT scores.