Fig. 1: Performance of human (purple), GPT-4 (dark blue), GPT-3.5 (light blue) and LLaMA2-70B (green) on the battery of theory of mind tests. | Nature Human Behaviour

Fig. 1: Performance of human (purple), GPT-4 (dark blue), GPT-3.5 (light blue) and LLaMA2-70B (green) on the battery of theory of mind tests.

From: Testing theory of mind in large language models and humans

Search

Advanced search

Search

Quick links