ABSTRACTTo examine whether integrating large language models (LLMs) into library reference services can provide equitable services to users regardless of gender and race, we simulated interactions using names indicative of gender and race to evaluate biases across three different sizes of the Llama 2 model. Tentative results indicated that gender test accuracy (54.9%) and racial bias test accuracy (28.5%) are approximately at chance level, suggesting LLM‐powered reference services can provide equitable services. However, word frequency analysis showed some slight differences in language use across gender and race groups. Model size analysis showed that biases did not decrease as model size increased. These tentative results highlight a positive outlook on integrating LLMs into reference services, while underscoring the need for cautious AI integration and ongoing bias monitoring.