We investigated the agreement between automated and gold-standard manual transcriptions of telephone chatbot-based semantic verbal fluency testing. We examined 78 cases from the Screening over Speech in Unselected Populations for Clinical Trials in AD (PROSPECT-AD) study, including cognitively normal individuals and individuals with subjective cognitive decline, mild cognitive impairment, and dementia. We used Bayesian Bland-Altman analysis of word count and the qualitative features of semantic cluster size, cluster switches, and word frequencies. We found high levels of agreement for word count, with a 93% probability of a newly observed difference being below the minimally important difference. The qualitative features had fair levels of agreement. Word count reached high levels of discrimination between cognitively impaired and unimpaired individuals, regardless of transcription mode. Our results support the use of automated speech recognition particularly for the assessment of quantitative speech features, even when using data from telephone calls with cognitively impaired individuals in their homes. High levels of agreement were found between automated and gold-standard manual transcriptions of telephone chatbot-based semantic verbal fluency testing, particularly for word count.The qualitative features had fair levels of agreement.Word count reached high levels of discrimination between cognitively impaired and unimpaired individuals, regardless of transcription mode.Automated speech recognition for the assessment of quantitative and qualitative speech features, even when using data from telephone calls with cognitively impaired individuals in their homes, seems feasible and reliable.