Large Language Model(LLM) has shown amazing abilities in reasoning tasks, theory of mind(ToM) has been tested in many studies as part of reasoning tasks, and social learning, which is closely related to theory of mind, are still lack of investigation. However, the test methods and materials make the test results unconvincing. We propose a dynamic gamified assessment(DGA) and hierarchical social learning measurement to test ToM and social learning capacities in LLMs. The test for ToM consists of five parts. First, we extract ToM tasks from ToM experiments and then design game rules to satisfy the ToM task requirement. After that, we design ToM questions to match the game’s rules and use these to generate test materials. Finally, we go through the above steps to test the model. To assess the social learning ability, we introduce a novel set of social rules (three in total). Experiment results demonstrate that, except GPT-4, LLMs performed poorly on the ToM test but showed a certain level of social learning ability in social learning measurement.
Read full abstract