Smart speaker devices are appealing to consumers, but the perceived usefulness of the multimodal voice experience is not fully understood. The purpose of this study was to evaluate the extent to which cognitive load, the relevance of visual information, and personality influence the perceived usefulness of multimodal voice assistant technology in a within-subjects repeated measures design. A multimodal voice prototype was created to answer the question, “What are some extreme weather conditions?” Nine variants, including 3, 5 and 7 system responses with relevant, irrelevant or no information presented on a screen were included. Three tasks were embedded within each condition (Stroop task, sort M&Ms and no task). Perceived usefulness score, recall, personality score, and fluctuations in galvanic skin response (GSR) values were the subjective and objective measures. The findings suggest that when there’s a smaller number of responses/words for the participant to attend to, and subsequently recall, in addition to relevant visual feedback to aid in that recall, they perceive the voice assistant experience to be more useful, while task conducted exhibits marginal significance in determining PU. Scores of conscientiousness, openness to experience, agreeableness, and neuroticism were successful in predicting some variation in the PU responses, while GSR data was not. It is highly recommended that UX designers of the multimodal interface create succinct voice responses with relevant visual feedback to accompany it, and to keep the main use cases of these products in mind to increase the experience’s PU and subsequent behavioral intention to use the product.