
In this study, I demonstrate how religion and theology can be useful for testing the performance of LLMs or LLM–powered chatbots, focusing on the measurement of philosophical skills. I present the results of testing four selected chatbots: ChatGPT, Bing, Bard, and Llama2. I utilize three examples of possible sources of inspiration from religion or theology: 1) the theory of the four senses of Scripture; 2) abstract theological statements; 3) an abstract logic formula derived from a religious text, to show that these sources are good materials for tasks that can effectively measure philosophical skills such as interpretation of a given fragment, creative deductive reasoning, and identification of ontological limitations. This approach enabled sensitive testing, revealing differences among the performances of the four chatbots. I also provide an example showing how we can create a benchmark to rate and compare such skills, using the assessment criteria and simplified scales to rate each chatbot with respect to each criterion.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call